A framework for variation discovery and genotyping using next-generation DNA sequencing data

Mark DePristo and colleagues report an analytical framework to discover and genotype variation using whole exome and genome resequencing data from next-generation sequencing technologies. They apply these methods to low-pass population sequencing data from the 1000 Genomes Project. Recent advances i...

Full description

Saved in:
Bibliographic Details
Published inNature genetics Vol. 43; no. 5; pp. 491 - 498
Main Authors DePristo, Mark A, Banks, Eric, Poplin, Ryan, Garimella, Kiran V, Maguire, Jared R, Hartl, Christopher, Philippakis, Anthony A, del Angel, Guillermo, Rivas, Manuel A, Hanna, Matt, McKenna, Aaron, Fennell, Tim J, Kernytsky, Andrew M, Sivachenko, Andrey Y, Cibulskis, Kristian, Gabriel, Stacey B, Altshuler, David, Daly, Mark J
Format Journal Article
LanguageEnglish
Published New York Nature Publishing Group US 01.05.2011
Nature Publishing Group
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Mark DePristo and colleagues report an analytical framework to discover and genotype variation using whole exome and genome resequencing data from next-generation sequencing technologies. They apply these methods to low-pass population sequencing data from the 1000 Genomes Project. Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (∼4×) 1000 Genomes Project datasets.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1061-4036
1546-1718
DOI:10.1038/ng.806