Quality control and quality assurance in genotypic data for genome-wide association studies

Genome‐wide scans of nucleotide variation in human subjects are providing an increasing number of replicated associations with complex disease traits. Most of the variants detected have small effects and, collectively, they account for a small fraction of the total genetic variance. Very large sampl...

Full description

Saved in:
Bibliographic Details
Published inGenetic epidemiology Vol. 34; no. 6; pp. 591 - 602
Main Authors Laurie, Cathy C., Doheny, Kimberly F., Mirel, Daniel B., Pugh, Elizabeth W., Bierut, Laura J., Bhangale, Tushar, Boehm, Frederick, Caporaso, Neil E., Cornelis, Marilyn C., Edenberg, Howard J., Gabriel, Stacy B., Harris, Emily L., Hu, Frank B., Jacobs, Kevin B., Kraft, Peter, Landi, Maria Teresa, Lumley, Thomas, Manolio, Teri A., McHugh, Caitlin, Painter, Ian, Paschall, Justin, Rice, John P., Rice, Kenneth M., Zheng, Xiuwen, Weir, Bruce S.
Format Journal Article
LanguageEnglish
Published Hoboken Wiley Subscription Services, Inc., A Wiley Company 01.09.2010
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Genome‐wide scans of nucleotide variation in human subjects are providing an increasing number of replicated associations with complex disease traits. Most of the variants detected have small effects and, collectively, they account for a small fraction of the total genetic variance. Very large sample sizes are required to identify and validate findings. In this situation, even small sources of systematic or random error can cause spurious results or obscure real effects. The need for careful attention to data quality has been appreciated for some time in this field, and a number of strategies for quality control and quality assurance (QC/QA) have been developed. Here we extend these methods and describe a system of QC/QA for genotypic data in genome‐wide association studies (GWAS). This system includes some new approaches that (1) combine analysis of allelic probe intensities and called genotypes to distinguish gender misidentification from sex chromosome aberrations, (2) detect autosomal chromosome aberrations that may affect genotype calling accuracy, (3) infer DNA sample quality from relatedness and allelic intensities, (4) use duplicate concordance to infer SNP quality, (5) detect genotyping artifacts from dependence of Hardy‐Weinberg equilibrium test P‐values on allelic frequency, and (6) demonstrate sensitivity of principal components analysis to SNP selection. The methods are illustrated with examples from the “Gene Environment Association Studies” (GENEVA) program. The results suggest several recommendations for QC/QA in the design and execution of GWAS. Genet. Epidemiol. 34: 591–602, 2010. © 2010 Wiley‐Liss, Inc.
Bibliography:Intramural Research Program of the NIH, National Library of Medicine
NIAAA - No. U10AA008401
NIH GEI - No. HG-06-033-NCI-01; No. U01HG04424; No. U01HG004438
istex:17FFBFB25DF8C52A3AFFB7665398989D83224050
ark:/67375/WNG-GXGVL80W-J
ArticleID:GEPI20516
NIH - No. U01HG004422; No. U01HG004399; No. HHSN268200782096C
NIDA - No. P01CA089392; No. R01DA013423
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ObjectType-Article-2
ObjectType-Feature-1
ISSN:0741-0395
1098-2272
1098-2272
DOI:10.1002/gepi.20516