Genotype imputation from low-coverage data for medical and population genetic analyses

Genotype imputation from low-pass sequencing data presents unique opportunities for genomic analyses but comes with specific challenges. In this study, we explore the impact of quality filters on genetic ancestry and Polygenic Score (PGS) estimation after imputing 32,769 low-pass genome-wide sequenc...

Full description

Saved in:

Bibliographic Details
Published in	Genome research Vol. 35; no. 9; pp. 1929 - 1941
Main Authors	Biagini, Simone Andrea, Becelaere, Sara, Aerden, Mio, Jatsenko, Tatjana, Hannes, Laurens, Van Damme, Philip, Breckpot, Jeroen, Devriendt, Koenraad, Thienpont, Bernard, Vermeesch, Joris Robert, Cleynen, Isabelle, Kivisild, Toomas
Format	Journal Article
Language	English
Published	United States Cold Spring Harbor Laboratory Press 01.09.2025
Subjects	Body Height - genetics Female Filters Genetic analysis Genetics, Population - methods Genome-Wide Association Study - methods Genomic analysis Genotype Genotype & phenotype Genotypes Humans Multifactorial Inheritance Polymorphism, Single Nucleotide Population genetics Pregnancy Principal components analysis
Online Access	Get full text
ISSN	1088-9051 1549-5469 1549-5469
DOI	10.1101/gr.280175.124

Cover

More Information
Summary:	Genotype imputation from low-pass sequencing data presents unique opportunities for genomic analyses but comes with specific challenges. In this study, we explore the impact of quality filters on genetic ancestry and Polygenic Score (PGS) estimation after imputing 32,769 low-pass genome-wide sequences (LPS) from noninvasive prenatal screening (NIPS) with an average autosomal sequence depth of ∼0.15×. In studies involving ultra-low coverage sequences, conventional approaches to secure genotype accuracy may fail, especially when multiple samples are pooled. To enhance the proportion of high-quality genotypes in large data sets, we introduce a filtering approach called GDI that combines genotype probability (GP), alternate allele dosage (DS), and INFO score filters. We demonstrate that the imputation tools QUILT and GLIMPSE2 achieve similar accuracy, which is high enough for broad-scale ancestry mapping but insufficient for high resolution principal component analysis (PCA), when applied without filters. With the GDI approach, we can achieve quality that is adequate for such purposes. Furthermore, we explored the impact of imputation errors, choice of variants, and filtering methods on PGS prediction for height in 1911 subjects with height data. We show that polygenic scores predict 23.7% of variance in height in our imputed data and that, contrary to the effect on PCA, the GDI filter does not improve the performance of PGS in height prediction. These results highlight that imputed LPS data can be leveraged for further biomedical and population genetic use, but there is a need to consider each downstream analysis tool individually for its imputation quality thresholds and filtering requirements.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1088-9051 1549-5469 1549-5469
DOI:	10.1101/gr.280175.124