Multivariate analysis of 269 Hapmap subjects and 1 million SNP using ‘taxonomy3’

Abstract

‘Taxonomy3’ is a novel mathematical method for the multivariate analysis of complex datasets. It is based on correlations of individualized divergences named Log Bayes Factors (LBFs), and their Eigen decomposition. We applied this method to 269 subjects of the Hapmap project (African , Caucasians, Chinese and Japanese) genotyped for more than 1 million SNPs (Illumina 1Mduo chip). We used newly developed software able to efficiently analyse such large datasets. Results show significant distinctions between ethnic groups, and sets of markers of importance for these distinctions. Multivariate models (internal leave one out cross-validation) based on all available SNPs, accurately predict subjects’ ethnicity. This confirms the benefits of this new method: powerful signal detection with small number of subjects, sub-group identification facilitating personalized medicine and ability to build multivariate predictive models using the whole genome. We intend to apply this method and software to other large and complex disease and pharmaco-genetic datasets.

Publication
In the 6th Joint Cold Spring Harbor/Wellcome Trust Conference ‘Pharmacogenomics and Personalized Medicine’
Date
Links