Applications of Machine Learning and Data Mining Methods to Detect Associations of Rare and Common Variants with Complex Traits

ABSTRACT Machine learning methods (MLMs), designed to develop models using high‐dimensional predictors, have been used to analyze genome‐wide genetic and genomic data to predict risks for complex traits. We summarize the results from six contributions to our Genetic Analysis Workshop 18 working grou...

Full description

Saved in:

Bibliographic Details
Published in	Genetic epidemiology Vol. 38; no. S1; pp. S81 - S85
Main Authors	Lu, Ake Tzu-Hui, Austin, Erin, Bonner, Ashley, Huang, Hsin-Hsiung, Cantor, Rita M.
Format	Journal Article
Language	English
Published	United States Blackwell Publishing Ltd 01.09.2014
Subjects	Artificial Intelligence Blood Pressure - genetics Data Mining Genetic Variation Genotype Humans machine learning methods Models, Genetic Pedigree penalized regression permanental classification Phenotype Polymorphism, Single Nucleotide Principal Component Analysis rare variants sparse graphical model sparse principal components Support Vector Machine rare variants sparse principal components support vector machine machine learning methods penalized regression permanental classification sparse graphical model
Online Access	Get full text

Cover

Loading…

More Information
Summary:	ABSTRACT Machine learning methods (MLMs), designed to develop models using high‐dimensional predictors, have been used to analyze genome‐wide genetic and genomic data to predict risks for complex traits. We summarize the results from six contributions to our Genetic Analysis Workshop 18 working group; these investigators applied MLMs and data mining to analyses of rare and common genetic variants measured in pedigrees. To develop risk profiles, group members analyzed blood pressure traits along with single‐nucleotide polymorphisms and rare variant genotypes derived from sequence and imputation analyses in large Mexican American pedigrees. Supervised MLMs included penalized regression with varying penalties, support vector machines, and permanental classification. Unsupervised MLMs included sparse principal components analysis and sparse graphical models. Entropy‐based components analyses were also used to mine these data. None of the investigators fully capitalized on the genetic information provided by the complete pedigrees. Their approaches either corrected for the nonindependence of the individuals within the pedigrees or analyzed only those who were independent. Some methods allowed for covariate adjustment, whereas others did not. We evaluated these methods using a variety of metrics. Four contributors conducted primary analyses on the real data, and the other two research groups used the simulated data with and without knowledge of the underlying simulation model. One group used the answers to the simulated data to assess power and type I errors. Although the MLMs applied were substantially different, each research group concluded that MLMs have advantages over standard statistical approaches with these high‐dimensional data.
Bibliography:	ArticleID:GEPI21830 istex:B259041E218C0A92E53584ED3FBC2D085D073FE8 National Institutes of Health - No. R01 GM031575 Database and Statistics Core - No. HL-28481 ark:/67375/WNG-VXXSNDCL-W ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 ObjectType-Article-2 ObjectType-Feature-1
ISSN:	0741-0395 1098-2272
DOI:	10.1002/gepi.21830