Machine learning approaches delimit cryptic taxa in a previously intractable species complex

[Display omitted] •Dispersal-limited low-vagility organisms are particularly challenging systems for species delimitation due to extreme phylogeographic structuring.•New machine learning approaches can be powerful tools for understanding species boundaries.•Incorporation of custom training datasets...

Full description

Saved in:
Bibliographic Details
Published inMolecular phylogenetics and evolution Vol. 195; p. 108061
Main Authors Heine, Haley L.A., Derkarabetian, Shahan, Morisawa, Rina, Fu, Phoebe A., Moyes, Nathaniel H.W., Boyer, Sarah L.
Format Journal Article
LanguageEnglish
Published United States Elsevier Inc 01.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:[Display omitted] •Dispersal-limited low-vagility organisms are particularly challenging systems for species delimitation due to extreme phylogeographic structuring.•New machine learning approaches can be powerful tools for understanding species boundaries.•Incorporation of custom training datasets from biologically relevant systems can improve estimation of species boundaries. Cryptic species are not diagnosable via morphological criteria, but can be detected through analysis of DNA sequences. A number of methods have been developed for identifying species based on genetic data; however, these methods are prone to over-splitting taxa with extreme population structure, such as dispersal-limited organisms. Machine learning methodologies have the potential to overcome this challenge. Here, we apply such approaches, using a large dataset generated through hybrid target enrichment of ultraconserved elements (UCEs). Our study taxon is the Aoraki denticulata species complex, a lineage of extremely low-dispersal arachnids endemic to the South Island of Aotearoa New Zealand. This group of mite harvesters has been the subject of previous species delimitation studies using smaller datasets generated through Sanger sequencing and analytical approaches that rely on multispecies coalescent models and barcoding gap discovery. Those analyses yielded a number of putative cryptic species that seems unrealistic and extreme, based on what we know about species’ geographic ranges and genetic diversity in non-cryptic mite harvesters. We find that machine learning approaches, on the other hand, identify cryptic species with geographic ranges that are similar to those seen in other morphologically diagnosable mite harvesters in Aotearoa New Zealand’s South Island. We performed both unsupervised and supervised machine learning analyses, the latter with training data drawn either from animals broadly (vagile and non-vagile) or from a custom training dataset from dispersal-limited harvesters. We conclude that applying machine learning approaches to the analysis of UCE-derived genetic data is an effective method for delimiting species in complexes of low-vagility cryptic species, and that the incorporation of training data from biologically relevant analogues can be critically informative.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1055-7903
1095-9513
1095-9513
DOI:10.1016/j.ympev.2024.108061