Comparison of algorithms to infer genetic population structure from unlinked molecular markers

Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were comp...

Full description

Saved in:

Bibliographic Details
Published in	Statistical applications in genetics and molecular biology Vol. 13; no. 4; pp. 391 - 402
Main Authors	Peña-Malavera, Andrea, Bruno, Cecilia, Fernandez, Elmer, Balzarini, Monica
Format	Journal Article
Language	English
Published	Germany De Gruyter 01.08.2014
Subjects	Algorithms Alleles Cluster Analysis Computer Simulation Genetics, Population - methods Genotype Models, Genetic Molecular Probes - genetics multilocus-biallelic genotypes plant breeding Polymorphism, Single Nucleotide self-organizing maps Zea mays - genetics
Online Access	Get full text
ISSN	2194-6302 1544-6115 1544-6115
DOI	10.1515/sagmb-2013-0006

Cover

Loading…

More Information
Summary:	Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high ) and two numbers of sub-populations ( =3 and =5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence ( =0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 ObjectType-Review-3 content type line 23
ISSN:	2194-6302 1544-6115 1544-6115
DOI:	10.1515/sagmb-2013-0006