Comparison of algorithms to infer genetic population structure from unlinked molecular markers

Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were comp...

Full description

Saved in:
Bibliographic Details
Published inStatistical applications in genetics and molecular biology Vol. 13; no. 4; pp. 391 - 402
Main Authors Peña-Malavera, Andrea, Bruno, Cecilia, Fernandez, Elmer, Balzarini, Monica
Format Journal Article
LanguageEnglish
Published Germany De Gruyter 01.08.2014
Subjects
Online AccessGet full text
ISSN2194-6302
1544-6115
1544-6115
DOI10.1515/sagmb-2013-0006

Cover

Abstract Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high ) and two numbers of sub-populations ( =3 and =5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence ( =0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data.
AbstractList Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high FST) and two numbers of sub-populations (K=3 and K=5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence (FST=0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data.Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high FST) and two numbers of sub-populations (K=3 and K=5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence (FST=0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data.
Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high ) and two numbers of sub-populations ( =3 and =5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence ( =0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data.
Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high
Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high FST) and two numbers of sub-populations (K=3 and K=5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence (FST=0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data.
Author Peña-Malavera, Andrea
Fernandez, Elmer
Balzarini, Monica
Bruno, Cecilia
Author_xml – sequence: 1
  givenname: Andrea
  surname: Peña-Malavera
  fullname: Peña-Malavera, Andrea
  organization: Facultad de Ciencias Agropecuarias, Universidad Nacional de Córdoba and CONICET (National Council of Scientific and Technological Research), cc 509, 5000 Córdoba, Argentina
– sequence: 2
  givenname: Cecilia
  surname: Bruno
  fullname: Bruno, Cecilia
  organization: Facultad de Ciencias Agropecuarias, Universidad Nacional de Córdoba and CONICET (National Council of Scientific and Technological Research), cc 509, 5000 Córdoba, Argentina
– sequence: 3
  givenname: Elmer
  surname: Fernandez
  fullname: Fernandez, Elmer
  organization: Facultad de Ingeniería, Universidad Católica de Córdoba and CONICET, Camino Alta Gracia Km 10, Cordoba, Argentina
– sequence: 4
  givenname: Monica
  surname: Balzarini
  fullname: Balzarini, Monica
  email: mbalzari@gmail.com
  organization: Facultad de Ciencias Agropecuarias, Universidad Nacional de Córdoba and CONICET (National Council of Scientific and Technological Research), cc 509, 5000 Córdoba, Argentina
BackLink https://www.ncbi.nlm.nih.gov/pubmed/24964261$$D View this record in MEDLINE/PubMed
BookMark eNp9kD1vFDEQQC2UiFwCNR1ySbPEY6-9twUFOvElRUqTtFhe7-zhxLte_KEo_x5fLlAgQTVTvDcavXNysoQFCXkD7D1IkJfJ7Oeh4QxEwxhTL8gGZNs2CkCekA2Hvu6C8TNyntIdYxy4YC_JGW971XIFG_J9F-bVRJfCQsNEjd-H6PKPOdEcqFsmjHSPC2Zn6RrW4k12lUw5FptLRDrFMNOyeLfc40jn4NFWKNLZxHuM6RU5nYxP-Pp5XpDbz59udl-bq-sv33YfrxorJORGdWIEVGAQ2nbgMAyDHdXIhOSDVWK7NbbnHSrZdnKUSnZoDGN9B3I7oOqtuCDvjnfXGH4WTFnPLln03iwYStIgZce20LG-om-f0TLMOOo1uvrso_7dpAKXR8DGkFLE6Q8CTB-q66fq-lBdH6pXQ_5lWJefSuVonP-P9-HoPRifMY64j-WxLvoulLjUXv8yQbSiB_EL3Q-cpg
CitedBy_id crossref_primary_10_1007_s13238_016_0302_5
crossref_primary_10_1080_00275514_2017_1307095
crossref_primary_10_1111_jcmm_15618
crossref_primary_10_1007_s10681_020_2569_0
crossref_primary_10_1038_srep15728
crossref_primary_10_1002_arch_22015
crossref_primary_10_1007_s10681_021_02926_5
crossref_primary_10_1590_1984_70332018v18n3n45
ContentType Journal Article
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1515/sagmb-2013-0006
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic

CrossRef
MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1544-6115
EndPage 402
ExternalDocumentID 24964261
10_1515_sagmb_2013_0006
10_1515_sagmb_2013_0006134391
Genre Comparative Study
Review
Research Support, Non-U.S. Gov't
Journal Article
GroupedDBID ---
-~S
0R~
123
1WD
4.4
AAAEU
AAAVF
AACIX
AAFPC
AAGVJ
AAILP
AAKRG
AALGR
AAONY
AAOWA
AAPJK
AAQCX
AASQH
AASQN
AAWFC
AAXCG
AAXMT
ABABW
ABAOT
ABAQN
ABDRH
ABFKT
ABIQR
ABJNI
ABLVI
ABMIY
ABPLS
ABRDF
ABRQL
ABUVI
ABWLS
ABXMZ
ABYBW
ACDEB
ACEFL
ACGFO
ACGFS
ACHNZ
ACMKP
ACONX
ACPMA
ACUND
ACXLN
ACYCL
ACZBO
ADALX
ADEQT
ADGQD
ADGYE
ADNPR
ADOZN
AECWL
AEDGQ
AEGVQ
AEICA
AEJQW
AEKEB
AEMOE
AENEX
AEQDQ
AEQLX
AERZL
AEXIE
AFAUI
AFBAA
AFBDD
AFBQV
AFCXV
AFGNR
AFQUK
AFYRI
AGBEV
AGQYU
AGWTP
AHCWZ
AHVWV
AHXUK
AIAGR
AIERV
AIKXB
AIWOI
AJATJ
AJPIC
AKXKS
ALMA_UNASSIGNED_HOLDINGS
ALUKF
ALWYM
AMAVY
ASYPN
AZMOX
BAKPI
BBCWN
BBDJO
BCIFA
BDLBQ
CKPZI
CS3
DASCH
DSRVY
DU5
EBS
EJD
EMOBN
F5P
FSTRU
HZ~
IY9
J9A
K.~
KDIRW
LG7
MV1
NQBSW
O9-
P2P
QD8
SA.
SLJYH
T2Y
UK5
WTRAM
AAYXX
CITATION
9-L
ABVMU
ACRPL
ADNMO
ADUQZ
AFSHE
AGGNV
ASPBG
AVWKF
AZFZN
CAG
CGR
COF
CUY
CVF
ECM
EIF
FEDTE
H13
HVGLF
LVMAB
NPM
ROL
RYL
~Z8
7X8
ID FETCH-LOGICAL-c351t-673d1e61ae144b21bbbcd6d0352bc6388ac927e65475d5657eaa0097158be69c3
ISSN 2194-6302
1544-6115
IngestDate Fri Sep 05 04:33:31 EDT 2025
Thu Apr 03 07:04:38 EDT 2025
Tue Jul 01 01:57:40 EDT 2025
Thu Apr 24 23:05:47 EDT 2025
Sat Sep 06 17:05:05 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 4
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c351t-673d1e61ae144b21bbbcd6d0352bc6388ac927e65475d5657eaa0097158be69c3
Notes ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
ObjectType-Review-3
content type line 23
PMID 24964261
PQID 1557081709
PQPubID 23479
PageCount 12
ParticipantIDs proquest_miscellaneous_1557081709
pubmed_primary_24964261
crossref_primary_10_1515_sagmb_2013_0006
crossref_citationtrail_10_1515_sagmb_2013_0006
walterdegruyter_journals_10_1515_sagmb_2013_0006134391
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2014-08-01
PublicationDateYYYYMMDD 2014-08-01
PublicationDate_xml – month: 08
  year: 2014
  text: 2014-08-01
  day: 01
PublicationDecade 2010
PublicationPlace Germany
PublicationPlace_xml – name: Germany
PublicationTitle Statistical applications in genetics and molecular biology
PublicationTitleAlternate Stat Appl Genet Mol Biol
PublicationYear 2014
Publisher De Gruyter
Publisher_xml – name: De Gruyter
SSID ssj0021230
Score 2.0754106
SecondaryResourceType review_article
Snippet Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying...
SourceID proquest
pubmed
crossref
walterdegruyter
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 391
SubjectTerms Algorithms
Alleles
Cluster Analysis
Computer Simulation
Genetics, Population - methods
Genotype
Models, Genetic
Molecular Probes - genetics
multilocus-biallelic genotypes
plant breeding
Polymorphism, Single Nucleotide
self-organizing maps
Zea mays - genetics
Title Comparison of algorithms to infer genetic population structure from unlinked molecular markers
URI https://www.degruyter.com/doi/10.1515/sagmb-2013-0006
https://www.ncbi.nlm.nih.gov/pubmed/24964261
https://www.proquest.com/docview/1557081709
Volume 13
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3da9swEBdZy2B7KPte9oUGexgEd7FlOc5jW7KVwfbUjj7NSLKclvkjNDaj_Uf27-5OlmWnXWDbiwm2I4Hu59Od7u53hLwLAimVSlPPF5p5IQtjT_ix8jQHU07xCHZEk23xNTo-DT-f8bPR6Ncga6mp5b66_mNdyf9IFe6BXLFK9h8k6waFG_Ab5AtXkDBc_0rGR8MmghORLytw9c8LQ9qAWVaX2CAZqxQnK9ena9IyxmLcwJSWNEiV8QPMzqJrlDspMGXHRnms3Yo2qaF0RmqB1WYSup2jJXvuR7H0Tr3uxZj8oS-8LyIXsNTC5VMOzgSasj261eoiv3D3Nw67F9jwpT-Aza8FBqHMwS7S_IrhOYYfuiw62Ias7g1D8GTb6k6nnNkAhOFA07K2ydetHYAbsoy1WBYSsIKtK6aG0qAe4GFVGECA5xmhB9lvhS5BsXt0h-wGsxnG_3cPPh0uvjlfHjb8qWWKghk_3JgPKabtCJv2zi0n5j7Z-2nyIlK9vGyu6i4Ob8ybkwdkz_ol9KAF2UMy0uUjcrftVHr1mHzvoUarjPZQo3VFDdSohQHtoUYd1ChCjXZQow4k1ELtCTn9uDg5OvZsaw5PMe7XWDCS-jqCDxwcchn4Er74NEqRXFcqUOmxUPNgprGzNU8xsq6FMHRlPJY6miv2lOyUVamfExqoLEsZS0PNo1Ai9xTsv0xkWexrFXI-JvvdAibK8tZj-5Q8Qf8VFj8xi5_g4mMqRTQm790fVi1ly_ZX33YSSUCtYqxMlLpq1vA2n02RvHI-Js9aUbnBOtGOSXRDdolVDuttE_oMK91fbB3yJbnXfxuvyA6ISb8Gq7eWbywCfwNFTbMu
linkProvider Walter de Gruyter
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Lb9RADLbKVgg4lHfZlscgceCSXZJ5NHtsVy0LlJ5a1BPRvLJF3WyqPITKr8fePFSK9gLnzEySsT22x_ZngHdRZIy1zgWh9jwQXMSBDmMbeImmnJUKNeIq2-JEzc7E53N5vgHTrhaG0iqdnxf1ddUgpI5dbmu6KOuxBlADj0s9zwxSmFoTIN-ML6pscQc2KYYmBrC5__Hg8Fvvd-HhTFctKJsiUJwSerbXrPOncvrL4nwAWz9XQez-C2_ooqOH4Lq_aFJQLkd1ZUb21y2Ax__8zUew1dqqbL9hrsew4ZdP4G7TvfL6KXyf9j0MWZ4yvZjnxY_qIitZlTNK8ioYcicVSbKrvk0YawBr68IzqmxhNSF1XHrHsq5PL8soY6gon8HZ0eHpdBa07RoCy2VYURGBC71CoqOTZqLQIBc45Qhw1VgU81jbSbTnqduxdBRt9VqvIKxkbLyaWP4cBst86V8Ai2yaOs6d8FIJQ3hEeCZznaZx6K2Qcgijjk6JbbHMqaXGIiGfBncuWe1cQjtH4XU1hPf9hKsGxmP90Lcd4RMUNYqf6KXP6xJHy70PBGg4GcJ2wxH9YujFKvJGh6BusUjSHgrluheGnKqfd_514hu4Nzv9epwcfzr5sgv38aloEhRfwgAp6l-h0VSZ161Q_AZXyBRa
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LT9tAEB4VUBEcgJYWQmlZJA69OKm93sU5UkrKS4gDVJxq7cspCokjP4Tg1zMTOxalyqWcvbu2d2Z2ZnZmvgHYCwKtjbHW85XjXsjDyFN-ZDwn0JQzQqJGnGRbXMjj6_D0Rtw8q4WhtErr-ln5UFQIqR2bmpIuyhqsAdTAnVz1hxopTK0JkG86Y5vMwQI6KxE6YAsHP78f_WrcLjyb6aYFRTP0JKd8no0Zy_ytm_4xOJdh5X4Sw24-8Jkq6q2Cmf5ElYEyaJeFbpvHF_iOr_vLNVipLVV2ULHWO3jjRu_hbdW78mEdfh82HQxZmjB110-z2-LPMGdFyijFK2PIm1QiycZNkzBWwdWWmWNU18JKwukYOMuG0y69bEj5Qln-Aa57R1eHx17drMEzXPgFlRBY30kkObpoOvA18oCVluBWtUEhj5TpBvuOeh0LS7FWp9QEwEpE2smu4R9hfpSO3CawwCSJ5dyGTshQExoRnshcJUnkOxMK0YL2lEyxqZHMqaHGXUweDW5cPNm4mDaOguuyBV-bCeMKxGP20N0p3WMUNIqeqJFLyxxHi_1vBGfYbcFGxRDNYujDSvJFWyBfcEhcHwn5rBf6nGqft_534g4sXv7oxecnF2efYAkfhlV24jbMI0HdZ7SYCv2lFoknKIgTCg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Comparison+of+algorithms+to+infer+genetic+population+structure+from+unlinked+molecular+markers&rft.jtitle=Statistical+applications+in+genetics+and+molecular+biology&rft.au=Pe%C3%B1a-Malavera%2C+Andrea&rft.au=Bruno%2C+Cecilia&rft.au=Fernandez%2C+Elmer&rft.au=Balzarini%2C+Monica&rft.date=2014-08-01&rft.eissn=1544-6115&rft.volume=13&rft.issue=4&rft.spage=391&rft_id=info:doi/10.1515%2Fsagmb-2013-0006&rft_id=info%3Apmid%2F24964261&rft.externalDocID=24964261
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2194-6302&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2194-6302&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2194-6302&client=summon