Comparison of algorithms to infer genetic population structure from unlinked molecular markers
Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were comp...
Saved in:
Published in | Statistical applications in genetics and molecular biology Vol. 13; no. 4; pp. 391 - 402 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Germany
De Gruyter
01.08.2014
|
Subjects | |
Online Access | Get full text |
ISSN | 2194-6302 1544-6115 1544-6115 |
DOI | 10.1515/sagmb-2013-0006 |
Cover
Abstract | Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high
) and two numbers of sub-populations (
=3 and
=5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence (
=0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data. |
---|---|
AbstractList | Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high FST) and two numbers of sub-populations (K=3 and K=5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence (FST=0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data.Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high FST) and two numbers of sub-populations (K=3 and K=5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence (FST=0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data. Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high ) and two numbers of sub-populations ( =3 and =5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence ( =0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data. Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high FST) and two numbers of sub-populations (K=3 and K=5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence (FST=0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data. |
Author | Peña-Malavera, Andrea Fernandez, Elmer Balzarini, Monica Bruno, Cecilia |
Author_xml | – sequence: 1 givenname: Andrea surname: Peña-Malavera fullname: Peña-Malavera, Andrea organization: Facultad de Ciencias Agropecuarias, Universidad Nacional de Córdoba and CONICET (National Council of Scientific and Technological Research), cc 509, 5000 Córdoba, Argentina – sequence: 2 givenname: Cecilia surname: Bruno fullname: Bruno, Cecilia organization: Facultad de Ciencias Agropecuarias, Universidad Nacional de Córdoba and CONICET (National Council of Scientific and Technological Research), cc 509, 5000 Córdoba, Argentina – sequence: 3 givenname: Elmer surname: Fernandez fullname: Fernandez, Elmer organization: Facultad de Ingeniería, Universidad Católica de Córdoba and CONICET, Camino Alta Gracia Km 10, Cordoba, Argentina – sequence: 4 givenname: Monica surname: Balzarini fullname: Balzarini, Monica email: mbalzari@gmail.com organization: Facultad de Ciencias Agropecuarias, Universidad Nacional de Córdoba and CONICET (National Council of Scientific and Technological Research), cc 509, 5000 Córdoba, Argentina |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/24964261$$D View this record in MEDLINE/PubMed |
BookMark | eNp9kD1vFDEQQC2UiFwCNR1ySbPEY6-9twUFOvElRUqTtFhe7-zhxLte_KEo_x5fLlAgQTVTvDcavXNysoQFCXkD7D1IkJfJ7Oeh4QxEwxhTL8gGZNs2CkCekA2Hvu6C8TNyntIdYxy4YC_JGW971XIFG_J9F-bVRJfCQsNEjd-H6PKPOdEcqFsmjHSPC2Zn6RrW4k12lUw5FptLRDrFMNOyeLfc40jn4NFWKNLZxHuM6RU5nYxP-Pp5XpDbz59udl-bq-sv33YfrxorJORGdWIEVGAQ2nbgMAyDHdXIhOSDVWK7NbbnHSrZdnKUSnZoDGN9B3I7oOqtuCDvjnfXGH4WTFnPLln03iwYStIgZce20LG-om-f0TLMOOo1uvrso_7dpAKXR8DGkFLE6Q8CTB-q66fq-lBdH6pXQ_5lWJefSuVonP-P9-HoPRifMY64j-WxLvoulLjUXv8yQbSiB_EL3Q-cpg |
CitedBy_id | crossref_primary_10_1007_s13238_016_0302_5 crossref_primary_10_1080_00275514_2017_1307095 crossref_primary_10_1111_jcmm_15618 crossref_primary_10_1007_s10681_020_2569_0 crossref_primary_10_1038_srep15728 crossref_primary_10_1002_arch_22015 crossref_primary_10_1007_s10681_021_02926_5 crossref_primary_10_1590_1984_70332018v18n3n45 |
ContentType | Journal Article |
DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM 7X8 |
DOI | 10.1515/sagmb-2013-0006 |
DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
DatabaseTitleList | MEDLINE - Academic CrossRef MEDLINE |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Biology |
EISSN | 1544-6115 |
EndPage | 402 |
ExternalDocumentID | 24964261 10_1515_sagmb_2013_0006 10_1515_sagmb_2013_0006134391 |
Genre | Comparative Study Review Research Support, Non-U.S. Gov't Journal Article |
GroupedDBID | --- -~S 0R~ 123 1WD 4.4 AAAEU AAAVF AACIX AAFPC AAGVJ AAILP AAKRG AALGR AAONY AAOWA AAPJK AAQCX AASQH AASQN AAWFC AAXCG AAXMT ABABW ABAOT ABAQN ABDRH ABFKT ABIQR ABJNI ABLVI ABMIY ABPLS ABRDF ABRQL ABUVI ABWLS ABXMZ ABYBW ACDEB ACEFL ACGFO ACGFS ACHNZ ACMKP ACONX ACPMA ACUND ACXLN ACYCL ACZBO ADALX ADEQT ADGQD ADGYE ADNPR ADOZN AECWL AEDGQ AEGVQ AEICA AEJQW AEKEB AEMOE AENEX AEQDQ AEQLX AERZL AEXIE AFAUI AFBAA AFBDD AFBQV AFCXV AFGNR AFQUK AFYRI AGBEV AGQYU AGWTP AHCWZ AHVWV AHXUK AIAGR AIERV AIKXB AIWOI AJATJ AJPIC AKXKS ALMA_UNASSIGNED_HOLDINGS ALUKF ALWYM AMAVY ASYPN AZMOX BAKPI BBCWN BBDJO BCIFA BDLBQ CKPZI CS3 DASCH DSRVY DU5 EBS EJD EMOBN F5P FSTRU HZ~ IY9 J9A K.~ KDIRW LG7 MV1 NQBSW O9- P2P QD8 SA. SLJYH T2Y UK5 WTRAM AAYXX CITATION 9-L ABVMU ACRPL ADNMO ADUQZ AFSHE AGGNV ASPBG AVWKF AZFZN CAG CGR COF CUY CVF ECM EIF FEDTE H13 HVGLF LVMAB NPM ROL RYL ~Z8 7X8 |
ID | FETCH-LOGICAL-c351t-673d1e61ae144b21bbbcd6d0352bc6388ac927e65475d5657eaa0097158be69c3 |
ISSN | 2194-6302 1544-6115 |
IngestDate | Fri Sep 05 04:33:31 EDT 2025 Thu Apr 03 07:04:38 EDT 2025 Tue Jul 01 01:57:40 EDT 2025 Thu Apr 24 23:05:47 EDT 2025 Sat Sep 06 17:05:05 EDT 2025 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 4 |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c351t-673d1e61ae144b21bbbcd6d0352bc6388ac927e65475d5657eaa0097158be69c3 |
Notes | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 ObjectType-Review-3 content type line 23 |
PMID | 24964261 |
PQID | 1557081709 |
PQPubID | 23479 |
PageCount | 12 |
ParticipantIDs | proquest_miscellaneous_1557081709 pubmed_primary_24964261 crossref_primary_10_1515_sagmb_2013_0006 crossref_citationtrail_10_1515_sagmb_2013_0006 walterdegruyter_journals_10_1515_sagmb_2013_0006134391 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2014-08-01 |
PublicationDateYYYYMMDD | 2014-08-01 |
PublicationDate_xml | – month: 08 year: 2014 text: 2014-08-01 day: 01 |
PublicationDecade | 2010 |
PublicationPlace | Germany |
PublicationPlace_xml | – name: Germany |
PublicationTitle | Statistical applications in genetics and molecular biology |
PublicationTitleAlternate | Stat Appl Genet Mol Biol |
PublicationYear | 2014 |
Publisher | De Gruyter |
Publisher_xml | – name: De Gruyter |
SSID | ssj0021230 |
Score | 2.0754106 |
SecondaryResourceType | review_article |
Snippet | Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying... |
SourceID | proquest pubmed crossref walterdegruyter |
SourceType | Aggregation Database Index Database Enrichment Source Publisher |
StartPage | 391 |
SubjectTerms | Algorithms Alleles Cluster Analysis Computer Simulation Genetics, Population - methods Genotype Models, Genetic Molecular Probes - genetics multilocus-biallelic genotypes plant breeding Polymorphism, Single Nucleotide self-organizing maps Zea mays - genetics |
Title | Comparison of algorithms to infer genetic population structure from unlinked molecular markers |
URI | https://www.degruyter.com/doi/10.1515/sagmb-2013-0006 https://www.ncbi.nlm.nih.gov/pubmed/24964261 https://www.proquest.com/docview/1557081709 |
Volume | 13 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3da9swEBdZy2B7KPte9oUGexgEd7FlOc5jW7KVwfbUjj7NSLKclvkjNDaj_Uf27-5OlmWnXWDbiwm2I4Hu59Od7u53hLwLAimVSlPPF5p5IQtjT_ix8jQHU07xCHZEk23xNTo-DT-f8bPR6Ncga6mp5b66_mNdyf9IFe6BXLFK9h8k6waFG_Ab5AtXkDBc_0rGR8MmghORLytw9c8LQ9qAWVaX2CAZqxQnK9ena9IyxmLcwJSWNEiV8QPMzqJrlDspMGXHRnms3Yo2qaF0RmqB1WYSup2jJXvuR7H0Tr3uxZj8oS-8LyIXsNTC5VMOzgSasj261eoiv3D3Nw67F9jwpT-Aza8FBqHMwS7S_IrhOYYfuiw62Ias7g1D8GTb6k6nnNkAhOFA07K2ydetHYAbsoy1WBYSsIKtK6aG0qAe4GFVGECA5xmhB9lvhS5BsXt0h-wGsxnG_3cPPh0uvjlfHjb8qWWKghk_3JgPKabtCJv2zi0n5j7Z-2nyIlK9vGyu6i4Ob8ybkwdkz_ol9KAF2UMy0uUjcrftVHr1mHzvoUarjPZQo3VFDdSohQHtoUYd1ChCjXZQow4k1ELtCTn9uDg5OvZsaw5PMe7XWDCS-jqCDxwcchn4Er74NEqRXFcqUOmxUPNgprGzNU8xsq6FMHRlPJY6miv2lOyUVamfExqoLEsZS0PNo1Ai9xTsv0xkWexrFXI-JvvdAibK8tZj-5Q8Qf8VFj8xi5_g4mMqRTQm790fVi1ly_ZX33YSSUCtYqxMlLpq1vA2n02RvHI-Js9aUbnBOtGOSXRDdolVDuttE_oMK91fbB3yJbnXfxuvyA6ISb8Gq7eWbywCfwNFTbMu |
linkProvider | Walter de Gruyter |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Lb9RADLbKVgg4lHfZlscgceCSXZJ5NHtsVy0LlJ5a1BPRvLJF3WyqPITKr8fePFSK9gLnzEySsT22x_ZngHdRZIy1zgWh9jwQXMSBDmMbeImmnJUKNeIq2-JEzc7E53N5vgHTrhaG0iqdnxf1ddUgpI5dbmu6KOuxBlADj0s9zwxSmFoTIN-ML6pscQc2KYYmBrC5__Hg8Fvvd-HhTFctKJsiUJwSerbXrPOncvrL4nwAWz9XQez-C2_ooqOH4Lq_aFJQLkd1ZUb21y2Ax__8zUew1dqqbL9hrsew4ZdP4G7TvfL6KXyf9j0MWZ4yvZjnxY_qIitZlTNK8ioYcicVSbKrvk0YawBr68IzqmxhNSF1XHrHsq5PL8soY6gon8HZ0eHpdBa07RoCy2VYURGBC71CoqOTZqLQIBc45Qhw1VgU81jbSbTnqduxdBRt9VqvIKxkbLyaWP4cBst86V8Ai2yaOs6d8FIJQ3hEeCZznaZx6K2Qcgijjk6JbbHMqaXGIiGfBncuWe1cQjtH4XU1hPf9hKsGxmP90Lcd4RMUNYqf6KXP6xJHy70PBGg4GcJ2wxH9YujFKvJGh6BusUjSHgrluheGnKqfd_514hu4Nzv9epwcfzr5sgv38aloEhRfwgAp6l-h0VSZ161Q_AZXyBRa |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LT9tAEB4VUBEcgJYWQmlZJA69OKm93sU5UkrKS4gDVJxq7cspCokjP4Tg1zMTOxalyqWcvbu2d2Z2ZnZmvgHYCwKtjbHW85XjXsjDyFN-ZDwn0JQzQqJGnGRbXMjj6_D0Rtw8q4WhtErr-ln5UFQIqR2bmpIuyhqsAdTAnVz1hxopTK0JkG86Y5vMwQI6KxE6YAsHP78f_WrcLjyb6aYFRTP0JKd8no0Zy_ytm_4xOJdh5X4Sw24-8Jkq6q2Cmf5ElYEyaJeFbpvHF_iOr_vLNVipLVV2ULHWO3jjRu_hbdW78mEdfh82HQxZmjB110-z2-LPMGdFyijFK2PIm1QiycZNkzBWwdWWmWNU18JKwukYOMuG0y69bEj5Qln-Aa57R1eHx17drMEzXPgFlRBY30kkObpoOvA18oCVluBWtUEhj5TpBvuOeh0LS7FWp9QEwEpE2smu4R9hfpSO3CawwCSJ5dyGTshQExoRnshcJUnkOxMK0YL2lEyxqZHMqaHGXUweDW5cPNm4mDaOguuyBV-bCeMKxGP20N0p3WMUNIqeqJFLyxxHi_1vBGfYbcFGxRDNYujDSvJFWyBfcEhcHwn5rBf6nGqft_534g4sXv7oxecnF2efYAkfhlV24jbMI0HdZ7SYCv2lFoknKIgTCg |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Comparison+of+algorithms+to+infer+genetic+population+structure+from+unlinked+molecular+markers&rft.jtitle=Statistical+applications+in+genetics+and+molecular+biology&rft.au=Pe%C3%B1a-Malavera%2C+Andrea&rft.au=Bruno%2C+Cecilia&rft.au=Fernandez%2C+Elmer&rft.au=Balzarini%2C+Monica&rft.date=2014-08-01&rft.eissn=1544-6115&rft.volume=13&rft.issue=4&rft.spage=391&rft_id=info:doi/10.1515%2Fsagmb-2013-0006&rft_id=info%3Apmid%2F24964261&rft.externalDocID=24964261 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2194-6302&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2194-6302&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2194-6302&client=summon |