Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads

Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read-based phasing. Third-generation nanopore sequence data have demonstrated a long read length, but current interpretati...

Full description

Saved in:
Bibliographic Details
Published inNature methods Vol. 18; no. 11; pp. 1322 - 1332
Main Authors Shafin, Kishwar, Pesout, Trevor, Chang, Pi-Chuan, Nattestad, Maria, Kolesnikov, Alexey, Goel, Sidharth, Baid, Gunjan, Kolmogorov, Mikhail, Eizenga, Jordan M., Miga, Karen H., Carnevali, Paolo, Jain, Miten, Carroll, Andrew, Paten, Benedict
Format Journal Article
LanguageEnglish
Published New York Nature Publishing Group US 01.11.2021
Nature Publishing Group
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read-based phasing. Third-generation nanopore sequence data have demonstrated a long read length, but current interpretation methods for their novel pore-based signal have unique error profiles, making accurate analysis challenging. Here, we introduce a haplotype-aware variant calling pipeline, PEPPER-Margin-DeepVariant, that produces state-of-the-art variant calling results with nanopore data. We show that our nanopore-based method outperforms the short-read-based single-nucleotide-variant identification method at the whole-genome scale and produces high-quality single-nucleotide variants in segmental duplications and low-mappability regions where short-read-based genotyping fails. We show that our pipeline can provide highly contiguous phase blocks across the genome with nanopore reads, contiguously spanning between 85% and 92% of annotated genes across six samples. We also extend PEPPER-Margin-DeepVariant to PacBio HiFi data, providing an efficient solution with superior performance over the current WhatsHap-DeepVariant standard. Finally, we demonstrate de novo assembly polishing methods that use nanopore and PacBio HiFi reads to produce diploid assemblies with high accuracy (Q35+ nanopore-polished and Q40+ PacBio HiFi-polished). The PEPPER-Margin-DeepVariant pipeline achieves highly accurate variant calling using nanopore and other long-read sequencing data.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
These authors contributed equally.
B.P. and A.C. designed and executed the study. K.S. developed PEPPER. T.P. developed Margin. P.C. designed candidate import functionality in DeepVariant. K.S., T.P. P.C. contributed equally to the methods development and core analysis presented. M.N. designed alt-event alignment in DeepVariant, A.K. contributed to haplotype sorting and improvements on DeepVariant runtime, S.G. contributed to candidate import module of DeepVariant, G.B. designed and executed the post-processing model to improve multiallelic variant accuracy. M.K. designed and evaluated assembly polishing. J.M.E. designed local phasing metric and contributed to phasing evaluation. K.H.M. provided experimental design guidance, P.C. generated assemblies and provided guidance on assembly polishing. M.J. performed nanopore sequencing, quality control and helped to design and execute analysis. All authors approve of the final manuscript.
Author Contributions
ISSN:1548-7091
1548-7105
1548-7105
DOI:10.1038/s41592-021-01299-w