Stratified polygenic risk prediction model with application to CAGI bipolar disorder sequencing data

Genetic data consists of a wide range of marker types, including common, low‐frequency, and rare variants. Multiple genetic markers and their interactions play central roles in the heritability of complex disease. In this study, we propose an algorithm that uses a stratified variable selection desig...

Full description

Saved in:

Bibliographic Details
Published in	Human mutation Vol. 38; no. 9; pp. 1235 - 1239
Main Authors	Wang, Maggie Haitian, Chang, Billy, Sun, Rui, Hu, Inchi, Xia, Xiaoxuan, Wu, William Ka Kei, Chong, Ka Chun, Zee, Benny Chung‐Ying
Format	Journal Article
Language	English
Published	United States Hindawi Limited 01.09.2017
Subjects	Algorithms bipolar Bipolar disorder Bipolar Disorder - genetics Classification classification of complex disorder Data processing disease prediction Epistasis Epistasis, Genetic Genetic markers Genetic Predisposition to Disease Genomes Heritability Humans interaction effect Models, Genetic mutation polygenic risk stratification Polymorphism, Single Nucleotide Sequence Analysis, DNA - methods W‐test mutation classification of complex disorder epistasis polygenic risk stratification bipolar disease prediction interaction effect W-test
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Genetic data consists of a wide range of marker types, including common, low‐frequency, and rare variants. Multiple genetic markers and their interactions play central roles in the heritability of complex disease. In this study, we propose an algorithm that uses a stratified variable selection design by genetic architectures and interaction effects, achieved by a dataset‐adaptive W‐test. The polygenic sets in all strata were integrated to form a classification rule. The algorithm was applied to the Critical Assessment of Genome Interpretation 4 bipolar challenge sequencing data. The prediction accuracy was 60% using genetic markers on an independent test set. We found that epistasis among common genetic variants contributed most substantially to prediction precision. However, the sample size was not large enough to draw conclusions for the lack of predictability of low‐frequency variants and their epistasis. This study proposed to perform complex trait prediction using a stratified design. The genetic data are divided into strata according to genetic architectures, and feature selection is conducted within each strata through a data adaptive W‐test for main effect and pairwise interactions. An ensemble classification algorithm can be applied to integrate the selected features to perform prediction. Application on CAGI data set showed that including interaction effect of common variants improved prediction accuracy.
Bibliography:	For the CAGI Special Issue Contract grant sponsors: National Science Foundation of China (81473035, 31401124); National Institute of Health (U41 HG007346, R13 HG006650); Health and Medical Research Fund of Food and Health Bureau of the Hong Kong Special Administrative Region Government (project reference no: CU‐16‐C11). ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Co-corresponding authors
ISSN:	1059-7794 1098-1004
DOI:	10.1002/humu.23229