Comparison of algorithms to infer genetic population structure from unlinked molecular markers

Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were comp...

Full description

Saved in:

Bibliographic Details
Published in	Statistical applications in genetics and molecular biology Vol. 13; no. 4; pp. 391 - 402
Main Authors	Peña-Malavera, Andrea, Bruno, Cecilia, Fernandez, Elmer, Balzarini, Monica
Format	Journal Article
Language	English
Published	Germany De Gruyter 01.08.2014
Subjects	Algorithms Alleles Cluster Analysis Computer Simulation Genetics, Population - methods Genotype Models, Genetic Molecular Probes - genetics multilocus-biallelic genotypes plant breeding Polymorphism, Single Nucleotide self-organizing maps Zea mays - genetics
Online Access	Get full text
ISSN	2194-6302 1544-6115 1544-6115
DOI	10.1515/sagmb-2013-0006

Cover

Abstract	Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high ) and two numbers of sub-populations ( =3 and =5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence ( =0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data.
AbstractList	Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high FST) and two numbers of sub-populations (K=3 and K=5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence (FST=0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data.Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high FST) and two numbers of sub-populations (K=3 and K=5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence (FST=0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data. Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high ) and two numbers of sub-populations ( =3 and =5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence ( =0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data. Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high FST) and two numbers of sub-populations (K=3 and K=5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence (FST=0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data.
Author	Peña-Malavera, Andrea Fernandez, Elmer Balzarini, Monica Bruno, Cecilia
Author_xml	– sequence: 1 givenname: Andrea surname: Peña-Malavera fullname: Peña-Malavera, Andrea organization: Facultad de Ciencias Agropecuarias, Universidad Nacional de Córdoba and CONICET (National Council of Scientific and Technological Research), cc 509, 5000 Córdoba, Argentina – sequence: 2 givenname: Cecilia surname: Bruno fullname: Bruno, Cecilia organization: Facultad de Ciencias Agropecuarias, Universidad Nacional de Córdoba and CONICET (National Council of Scientific and Technological Research), cc 509, 5000 Córdoba, Argentina – sequence: 3 givenname: Elmer surname: Fernandez fullname: Fernandez, Elmer organization: Facultad de Ingeniería, Universidad Católica de Córdoba and CONICET, Camino Alta Gracia Km 10, Cordoba, Argentina – sequence: 4 givenname: Monica surname: Balzarini fullname: Balzarini, Monica email: mbalzari@gmail.com organization: Facultad de Ciencias Agropecuarias, Universidad Nacional de Córdoba and CONICET (National Council of Scientific and Technological Research), cc 509, 5000 Córdoba, Argentina
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/24964261$$D View this record in MEDLINE/PubMed
BookMark	eNp9kD1vFDEQQC2UiFwCNR1ySbPEY6-9twUFOvElRUqTtFhe7-zhxLte_KEo_x5fLlAgQTVTvDcavXNysoQFCXkD7D1IkJfJ7Oeh4QxEwxhTL8gGZNs2CkCekA2Hvu6C8TNyntIdYxy4YC_JGW971XIFG_J9F-bVRJfCQsNEjd-H6PKPOdEcqFsmjHSPC2Zn6RrW4k12lUw5FptLRDrFMNOyeLfc40jn4NFWKNLZxHuM6RU5nYxP-Pp5XpDbz59udl-bq-sv33YfrxorJORGdWIEVGAQ2nbgMAyDHdXIhOSDVWK7NbbnHSrZdnKUSnZoDGN9B3I7oOqtuCDvjnfXGH4WTFnPLln03iwYStIgZce20LG-om-f0TLMOOo1uvrso_7dpAKXR8DGkFLE6Q8CTB-q66fq-lBdH6pXQ_5lWJefSuVonP-P9-HoPRifMY64j-WxLvoulLjUXv8yQbSiB_EL3Q-cpg
CitedBy_id	crossref_primary_10_1007_s13238_016_0302_5 crossref_primary_10_1080_00275514_2017_1307095 crossref_primary_10_1111_jcmm_15618 crossref_primary_10_1007_s10681_020_2569_0 crossref_primary_10_1038_srep15728 crossref_primary_10_1002_arch_22015 crossref_primary_10_1007_s10681_021_02926_5 crossref_primary_10_1590_1984_70332018v18n3n45
ContentType	Journal Article
DBID	AAYXX CITATION CGR CUY CVF ECM EIF NPM 7X8
DOI	10.1515/sagmb-2013-0006
DatabaseName	CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic
DatabaseTitle	CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic
DatabaseTitleList	MEDLINE - Academic CrossRef MEDLINE
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Biology
EISSN	1544-6115
EndPage	402
ExternalDocumentID	24964261 10_1515_sagmb_2013_0006 10_1515_sagmb_2013_0006134391
Genre	Comparative Study Review Research Support, Non-U.S. Gov't Journal Article
GroupedDBID	--- -~S 0R~ 123 1WD 4.4 AAAEU AAAVF AACIX AAFPC AAGVJ AAILP AAKRG AALGR AAONY AAOWA AAPJK AAQCX AASQH AASQN AAWFC AAXCG AAXMT ABABW ABAOT ABAQN ABDRH ABFKT ABIQR ABJNI ABLVI ABMIY ABPLS ABRDF ABRQL ABUVI ABWLS ABXMZ ABYBW ACDEB ACEFL ACGFO ACGFS ACHNZ ACMKP ACONX ACPMA ACUND ACXLN ACYCL ACZBO ADALX ADEQT ADGQD ADGYE ADNPR ADOZN AECWL AEDGQ AEGVQ AEICA AEJQW AEKEB AEMOE AENEX AEQDQ AEQLX AERZL AEXIE AFAUI AFBAA AFBDD AFBQV AFCXV AFGNR AFQUK AFYRI AGBEV AGQYU AGWTP AHCWZ AHVWV AHXUK AIAGR AIERV AIKXB AIWOI AJATJ AJPIC AKXKS ALMA_UNASSIGNED_HOLDINGS ALUKF ALWYM AMAVY ASYPN AZMOX BAKPI BBCWN BBDJO BCIFA BDLBQ CKPZI CS3 DASCH DSRVY DU5 EBS EJD EMOBN F5P FSTRU HZ~ IY9 J9A K.~ KDIRW LG7 MV1 NQBSW O9- P2P QD8 SA. SLJYH T2Y UK5 WTRAM AAYXX CITATION 9-L ABVMU ACRPL ADNMO ADUQZ AFSHE AGGNV ASPBG AVWKF AZFZN CAG CGR COF CUY CVF ECM EIF FEDTE H13 HVGLF LVMAB NPM ROL RYL ~Z8 7X8
ID	FETCH-LOGICAL-c351t-673d1e61ae144b21bbbcd6d0352bc6388ac927e65475d5657eaa0097158be69c3
ISSN	2194-6302 1544-6115
IngestDate	Fri Sep 05 04:33:31 EDT 2025 Thu Apr 03 07:04:38 EDT 2025 Tue Jul 01 01:57:40 EDT 2025 Thu Apr 24 23:05:47 EDT 2025 Sat Sep 06 17:05:05 EDT 2025
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	4
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c351t-673d1e61ae144b21bbbcd6d0352bc6388ac927e65475d5657eaa0097158be69c3
Notes	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 ObjectType-Review-3 content type line 23
PMID	24964261
PQID	1557081709
PQPubID	23479
PageCount	12
ParticipantIDs	proquest_miscellaneous_1557081709 pubmed_primary_24964261 crossref_primary_10_1515_sagmb_2013_0006 crossref_citationtrail_10_1515_sagmb_2013_0006 walterdegruyter_journals_10_1515_sagmb_2013_0006134391
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2014-08-01
PublicationDateYYYYMMDD	2014-08-01
PublicationDate_xml	– month: 08 year: 2014 text: 2014-08-01 day: 01
PublicationDecade	2010
PublicationPlace	Germany
PublicationPlace_xml	– name: Germany
PublicationTitle	Statistical applications in genetics and molecular biology
PublicationTitleAlternate	Stat Appl Genet Mol Biol
PublicationYear	2014
Publisher	De Gruyter
Publisher_xml	– name: De Gruyter
SSID	ssj0021230
Score	2.0754106
SecondaryResourceType	review_article
Snippet	Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying...
SourceID	proquest pubmed crossref walterdegruyter
SourceType	Aggregation Database Index Database Enrichment Source Publisher
StartPage	391
SubjectTerms	Algorithms Alleles Cluster Analysis Computer Simulation Genetics, Population - methods Genotype Models, Genetic Molecular Probes - genetics multilocus-biallelic genotypes plant breeding Polymorphism, Single Nucleotide self-organizing maps Zea mays - genetics
Title	Comparison of algorithms to infer genetic population structure from unlinked molecular markers
URI	https://www.degruyter.com/doi/10.1515/sagmb-2013-0006 https://www.ncbi.nlm.nih.gov/pubmed/24964261 https://www.proquest.com/docview/1557081709
Volume	13
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3da9swEBdZy2B7KPte9oUGexgEd7FlOc5jW7KVwfbUjj7NSLKclvkjNDaj_Uf27-5OlmWnXWDbiwm2I4Hu59Od7u53hLwLAimVSlPPF5p5IQtjT_ix8jQHU07xCHZEk23xNTo-DT-f8bPR6Ncga6mp5b66_mNdyf9IFe6BXLFK9h8k6waFG_Ab5AtXkDBc_0rGR8MmghORLytw9c8LQ9qAWVaX2CAZqxQnK9ena9IyxmLcwJSWNEiV8QPMzqJrlDspMGXHRnms3Yo2qaF0RmqB1WYSup2jJXvuR7H0Tr3uxZj8oS-8LyIXsNTC5VMOzgSasj261eoiv3D3Nw67F9jwpT-Aza8FBqHMwS7S_IrhOYYfuiw62Ias7g1D8GTb6k6nnNkAhOFA07K2ydetHYAbsoy1WBYSsIKtK6aG0qAe4GFVGECA5xmhB9lvhS5BsXt0h-wGsxnG_3cPPh0uvjlfHjb8qWWKghk_3JgPKabtCJv2zi0n5j7Z-2nyIlK9vGyu6i4Ob8ybkwdkz_ol9KAF2UMy0uUjcrftVHr1mHzvoUarjPZQo3VFDdSohQHtoUYd1ChCjXZQow4k1ELtCTn9uDg5OvZsaw5PMe7XWDCS-jqCDxwcchn4Er74NEqRXFcqUOmxUPNgprGzNU8xsq6FMHRlPJY6miv2lOyUVamfExqoLEsZS0PNo1Ai9xTsv0xkWexrFXI-JvvdAibK8tZj-5Q8Qf8VFj8xi5_g4mMqRTQm790fVi1ly_ZX33YSSUCtYqxMlLpq1vA2n02RvHI-Js9aUbnBOtGOSXRDdolVDuttE_oMK91fbB3yJbnXfxuvyA6ISb8Gq7eWbywCfwNFTbMu
linkProvider	Walter de Gruyter
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Lb9RADLbKVgg4lHfZlscgceCSXZJ5NHtsVy0LlJ5a1BPRvLJF3WyqPITKr8fePFSK9gLnzEySsT22x_ZngHdRZIy1zgWh9jwQXMSBDmMbeImmnJUKNeIq2-JEzc7E53N5vgHTrhaG0iqdnxf1ddUgpI5dbmu6KOuxBlADj0s9zwxSmFoTIN-ML6pscQc2KYYmBrC5__Hg8Fvvd-HhTFctKJsiUJwSerbXrPOncvrL4nwAWz9XQez-C2_ooqOH4Lq_aFJQLkd1ZUb21y2Ax__8zUew1dqqbL9hrsew4ZdP4G7TvfL6KXyf9j0MWZ4yvZjnxY_qIitZlTNK8ioYcicVSbKrvk0YawBr68IzqmxhNSF1XHrHsq5PL8soY6gon8HZ0eHpdBa07RoCy2VYURGBC71CoqOTZqLQIBc45Qhw1VgU81jbSbTnqduxdBRt9VqvIKxkbLyaWP4cBst86V8Ai2yaOs6d8FIJQ3hEeCZznaZx6K2Qcgijjk6JbbHMqaXGIiGfBncuWe1cQjtH4XU1hPf9hKsGxmP90Lcd4RMUNYqf6KXP6xJHy70PBGg4GcJ2wxH9YujFKvJGh6BusUjSHgrluheGnKqfd_514hu4Nzv9epwcfzr5sgv38aloEhRfwgAp6l-h0VSZ161Q_AZXyBRa
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LT9tAEB4VUBEcgJYWQmlZJA69OKm93sU5UkrKS4gDVJxq7cspCokjP4Tg1zMTOxalyqWcvbu2d2Z2ZnZmvgHYCwKtjbHW85XjXsjDyFN-ZDwn0JQzQqJGnGRbXMjj6_D0Rtw8q4WhtErr-ln5UFQIqR2bmpIuyhqsAdTAnVz1hxopTK0JkG86Y5vMwQI6KxE6YAsHP78f_WrcLjyb6aYFRTP0JKd8no0Zy_ytm_4xOJdh5X4Sw24-8Jkq6q2Cmf5ElYEyaJeFbpvHF_iOr_vLNVipLVV2ULHWO3jjRu_hbdW78mEdfh82HQxZmjB110-z2-LPMGdFyijFK2PIm1QiycZNkzBWwdWWmWNU18JKwukYOMuG0y69bEj5Qln-Aa57R1eHx17drMEzXPgFlRBY30kkObpoOvA18oCVluBWtUEhj5TpBvuOeh0LS7FWp9QEwEpE2smu4R9hfpSO3CawwCSJ5dyGTshQExoRnshcJUnkOxMK0YL2lEyxqZHMqaHGXUweDW5cPNm4mDaOguuyBV-bCeMKxGP20N0p3WMUNIqeqJFLyxxHi_1vBGfYbcFGxRDNYujDSvJFWyBfcEhcHwn5rBf6nGqft_534g4sXv7oxecnF2efYAkfhlV24jbMI0HdZ7SYCv2lFoknKIgTCg
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Comparison+of+algorithms+to+infer+genetic+population+structure+from+unlinked+molecular+markers&rft.jtitle=Statistical+applications+in+genetics+and+molecular+biology&rft.au=Pe%C3%B1a-Malavera%2C+Andrea&rft.au=Bruno%2C+Cecilia&rft.au=Fernandez%2C+Elmer&rft.au=Balzarini%2C+Monica&rft.date=2014-08-01&rft.eissn=1544-6115&rft.volume=13&rft.issue=4&rft.spage=391&rft_id=info:doi/10.1515%2Fsagmb-2013-0006&rft_id=info%3Apmid%2F24964261&rft.externalDocID=24964261
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2194-6302&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2194-6302&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2194-6302&client=summon