A Hybrid Likelihood Model for Sequence-Based Disease Association Studies. e1003224

In the past few years, case-control studies of common diseases have shifted their focus from single genes to whole exomes. New sequencing technologies now routinely detect hundreds of thousands of sequence variants in a single study, many of which are rare or even novel. The limitation of classical...

Full description

Saved in:
Bibliographic Details
Published inPLoS genetics Vol. 9; no. 1
Main Authors Chen, Yun-Ching, Carter, Hannah, Parla, Jennifer, Kramer, Melissa, Goes, Fernando S, Pirooznia, Mehdi, Zandi, Peter P, McCombie, W Richard, Potash, James B, Karchin, Rachel
Format Journal Article
LanguageEnglish
Published 01.01.2013
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In the past few years, case-control studies of common diseases have shifted their focus from single genes to whole exomes. New sequencing technologies now routinely detect hundreds of thousands of sequence variants in a single study, many of which are rare or even novel. The limitation of classical single-marker association analysis for rare variants has been a challenge in such studies. A new generation of statistical methods for case-control association studies has been developed to meet this challenge. A common approach to association analysis of rare variants is the burden-style collapsing methods to combine rare variant data within individuals across or within genes. Here, we propose a new hybrid likelihood model that combines a burden test with a test of the position distribution of variants. In extensive simulations and on empirical data from the Dallas Heart Study, the new model demonstrates consistently good power, in particular when applied to a gene set (e.g., multiple candidate genes with shared biological function or pathway), when rare variants cluster in key functional regions of a gene, and when protective variants are present. When applied to data from an ongoing sequencing study of bipolar disorder (191 cases, 107 controls), the model identifies seven gene sets with nominal p-values0.05, of which one MAPK signaling pathway (KEGG) reaches trend-level significance after correcting for multiple testing.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
content type line 23
ObjectType-Feature-1
ISSN:1553-7404
DOI:10.1371/journal.pgen.1003224