Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies

Advances in long-read sequencing technologies and genome assembly methods have enabled the recent completion of the first telomere-to-telomere human genome assembly, which resolves complex segmental duplications and large tandem repeats, including centromeric satellite arrays in a complete hydatidif...

Full description

Saved in:
Bibliographic Details
Published inNature methods Vol. 19; no. 6; pp. 687 - 695
Main Authors Mc Cartney, Ann M., Shafin, Kishwar, Alonge, Michael, Bzikadze, Andrey V., Formenti, Giulio, Fungtammasan, Arkarachai, Howe, Kerstin, Jain, Chirag, Koren, Sergey, Logsdon, Glennis A., Miga, Karen H., Mikheenko, Alla, Paten, Benedict, Shumate, Alaina, Soto, Daniela C., Sović, Ivan, Wood, Jonathan M. D., Zook, Justin M., Phillippy, Adam M., Rhie, Arang
Format Journal Article
LanguageEnglish
Published New York Nature Publishing Group US 01.06.2022
Nature Publishing Group
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Advances in long-read sequencing technologies and genome assembly methods have enabled the recent completion of the first telomere-to-telomere human genome assembly, which resolves complex segmental duplications and large tandem repeats, including centromeric satellite arrays in a complete hydatidiform mole (CHM13). Although derived from highly accurate sequences, evaluation revealed evidence of small errors and structural misassemblies in the initial draft assembly. To correct these errors, we designed a new repeat-aware polishing strategy that made accurate assembly corrections in large repeats without overcorrection, ultimately fixing 51% of the existing errors and improving the assembly quality value from 70.2 to 73.9 measured from PacBio high-fidelity and Illumina k -mers. By comparing our results to standard automated polishing tools, we outline common polishing errors and offer practical suggestions for genome projects with limited resources. We also show how sequencing biases in both high-fidelity and Oxford Nanopore Technologies reads cause signature assembly errors that can be corrected with a diverse panel of sequencing technologies. The work describes the validation and polishing strategies developed by the telomere-to-telomere consortium for evaluating and improving the first complete human genome assembly.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
AUTHOR CONTRIBUTION STATEMENTS
These authors contributed equally
AR and AMP conceived and supervised the project. AMM, KS, GF, KH, JMDW, and AR performed the pre-polishing evaluation. KS, MA, AVB, AF, CJ, AM, BP, and AR aligned reads and called variants. AMM, KS, MA, GF, AF, KHM, AM, JMZ, and AR manually validated variant calls. DCS and JMZ performed the gene collapse and expansion analysis. KS, MA, AVB, GAL, KHM, AM, and AR identified and curated heterozygous and “issues” loci. KS, MA, SK, and BP patched and polished the telomeres. AMM, MA, AS, and IS performed automated polishing. AMM, KS, MA and AR wrote the manuscript, with assistance from all authors. All authors approved of the final manuscript.
ISSN:1548-7091
1548-7105
1548-7105
DOI:10.1038/s41592-022-01440-3