Novel methods for genotype imputation to whole-genome sequence and a simple linear model to predict imputation accuracy

Accurate imputation plays a major role in genomic studies of livestock industries, where the number of genotyped or sequenced animals is limited by costs. This study explored methods to create an ideal reference population for imputation to Next Generation Sequencing data in cattle. Methods for clus...

Full description

Saved in:

Bibliographic Details
Published in	BMC genetics Vol. 18; no. 1; p. 120
Main Authors	Larmer, Steven G, Sargolzaei, Mehdi, Brito, Luiz F, Ventura, Ricardo V, Schenkel, Flávio S
Format	Journal Article
Language	English
Published	England BioMed Central Ltd 27.12.2017 BioMed Central BMC
Subjects	Algorithms Animals Cattle Cattle - genetics Cattle genomics DNA sequencing Genetic aspects Genomic clustering Genotype imputation Linear Models Methodology Models, Genetic Nucleotide sequencing Polymorphism, Single Nucleotide Sequencing data Whole Genome Sequencing - veterinary Genomic clustering Genotype imputation Sequencing data Cattle genomics
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Accurate imputation plays a major role in genomic studies of livestock industries, where the number of genotyped or sequenced animals is limited by costs. This study explored methods to create an ideal reference population for imputation to Next Generation Sequencing data in cattle. Methods for clustering of animals for imputation were explored, using 1000 Bull Genomes Project sequence data on 1146 animals from a variety of beef and dairy breeds. Imputation from 50 K to 777 K was first carried out to choose an ideal clustering method, using ADMIXTURE or PLINK clustering algorithms with either genotypes or reconstructed haplotypes. Due to efficiency, accuracy and ease of use, clustering with PLINK using haplotypes as quasi-genotypes was chosen as the most advantageous grouping method. It was found that using a clustered population slightly decreased computing time, while maintaining accuracy across the population. Although overall accuracy remained the same, a slight increase in accuracy was observed for groups of animals in some breeds (primarily purebred beef cattle from breeds with fewer sequenced animals) and for other groups, primarily crossbreed animals, a slight decrease in accuracy was observed. However, it was noted that some animals in each breed were poorly imputed across all methods. When imputed sequences were included in the reference population to aid imputation of poorly imputed animals, a small increase in overall accuracy was observed for nearly every individual in the population. Two models were created to predict imputation accuracy, a complete model using all information available including Euclidean distances from genotypes and haplotypes, pedigree information, and clustering groups and a simple model using only breed and an Euclidean distance matrix as predictors. Both models were successful in predicting imputation accuracy, with correlations between predicted and true imputation accuracy as measured by concordance rate of 0.87 and 0.83, respectively. A clustering methodology can be very useful to subgroup cattle for efficient genotype imputation. In addition, accuracy of genotype imputation from medium to high-density Single Nucleotide Polymorphisms (SNP) chip panels to whole-genome sequence can be predicted well using a simple linear model defined in this study.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1471-2156 1471-2156
DOI:	10.1186/s12863-017-0588-1