KoVariome: Korean National Standard Reference Variome database of whole genomes with comprehensive SNV, indel, CNV, and SV analyses

High-coverage whole-genome sequencing data of a single ethnicity can provide a useful catalogue of population-specific genetic variations, and provides a critical resource that can be used to more accurately identify pathogenic genetic variants. We report a comprehensive analysis of the Korean popul...

Full description

Saved in:
Bibliographic Details
Published inScientific reports Vol. 8; no. 1; pp. 5677 - 14
Main Authors Kim, Jungeun, Weber, Jessica A, Jho, Sungwoong, Jang, Jinho, Jun, JeHoon, Cho, Yun Sung, Kim, Hak-Min, Kim, Hyunho, Kim, Yumi, Chung, OkSung, Kim, Chang Geun, Lee, HyeJin, Kim, Byung Chul, Han, Kyudong, Koh, InSong, Chae, Kyun Shik, Lee, Semin, Edwards, Jeremy S, Bhak, Jong
Format Journal Article
LanguageEnglish
Published England Nature Publishing Group 04.04.2018
Nature Publishing Group UK
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:High-coverage whole-genome sequencing data of a single ethnicity can provide a useful catalogue of population-specific genetic variations, and provides a critical resource that can be used to more accurately identify pathogenic genetic variants. We report a comprehensive analysis of the Korean population, and present the Korean National Standard Reference Variome (KoVariome). As a part of the Korean Personal Genome Project (KPGP), we constructed the KoVariome database using 5.5 terabases of whole genome sequence data from 50 healthy Korean individuals in order to characterize the benign ethnicity-relevant genetic variation present in the Korean population. In total, KoVariome includes 12.7M single-nucleotide variants (SNVs), 1.7M short insertions and deletions (indels), 4K structural variations (SVs), and 3.6K copy number variations (CNVs). Among them, 2.4M (19%) SNVs and 0.4M (24%) indels were identified as novel. We also discovered selective enrichment of 3.8M SNVs and 0.5M indels in Korean individuals, which were used to filter out 1,271 coding-SNVs not originally removed from the 1,000 Genomes Project when prioritizing disease-causing variants. KoVariome health records were used to identify novel disease-causing variants in the Korean population, demonstrating the value of high-quality ethnic variation databases for the accurate interpretation of individual genomes and the precise characterization of genetic variations.
ISSN:2045-2322
2045-2322
DOI:10.1038/s41598-018-23837-x