Unsupervised Discovery of Ancestry Informative Markers and Genetic Admixture Proportions in Biobank-Scale Data Sets

Admixture estimation plays a crucial role in ancestry inference and genomewide association studies (GWAS). Computer programs such as ADMIXTURE and STRUCTURE are commonly employed to estimate the admixture proportions of sample individuals. However, these programs can be overwhelmed by the computatio...

Full description

Saved in:

Bibliographic Details
Published in	bioRxiv
Main Authors	Ko, Seyoon, Chu, Benjamin B, Peterson, Daniel, Chidera Okenwa, Papp, Jeanette C, Alexander, David H, Sobel, Eric M, Zhou, Hua, Lange, Kenneth L
Format	Paper
Language	English
Published	Cold Spring Harbor Cold Spring Harbor Laboratory Press 24.10.2022
Subjects	Biobanks Computer applications Computer programs Single-nucleotide polymorphism
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Admixture estimation plays a crucial role in ancestry inference and genomewide association studies (GWAS). Computer programs such as ADMIXTURE and STRUCTURE are commonly employed to estimate the admixture proportions of sample individuals. However, these programs can be overwhelmed by the computational burdens imposed by the 10^5 to 10^6 samples and millions of markers commonly found in modern biobanks. An attractive strategy is to run these programs on a set of ancestry informative SNP markers (AIMs) that exhibit substantially different frequencies across populations. Unfortunately, existing methods for identifying AIMs require knowing ancestry labels for a subset of the sample. This supervised learning approach creates a chicken and the egg scenario. In this paper, we present an unsupervised, scalable framework that seamlessly carries out AIM selection and likelihood-based estimation of admixture proportions. Our simulated and real data examples show that this approach is scalable to modern biobank data sets. Our implementation of the method is called OpenADMIXTURE. Competing Interest Statement The authors have declared no competing interest.
DOI:	10.1101/2022.10.22.513294