A comparison of machine learning techniques for taxonomic classification of teeth from the Family Bovidae

This study explores the performance of machine learning algorithms on the classification of fossil teeth in the Family Bovidae. Isolated bovid teeth are typically the most common fossils found in southern Africa and they often constitute the basis for paleoenvironmental reconstructions. Taxonomic id...

Full description

Saved in:
Bibliographic Details
Published inJournal of applied statistics Vol. 45; no. 15; pp. 2773 - 2787
Main Authors Matthews, Gregory J., Brophy, Juliet K., Luetkemeier, Maxwell, Gu, Hongie, Thiruvathukal, George K.
Format Journal Article
LanguageEnglish
Published Abingdon Taylor & Francis 18.11.2018
Taylor & Francis Ltd
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This study explores the performance of machine learning algorithms on the classification of fossil teeth in the Family Bovidae. Isolated bovid teeth are typically the most common fossils found in southern Africa and they often constitute the basis for paleoenvironmental reconstructions. Taxonomic identification of fossil bovid teeth, however, is often imprecise and subjective. Using modern teeth with known taxons, machine learning algorithms can be trained to classify fossils. Previous work by Brophy et al. [Quantitative morphological analysis of bovid teeth and implications for paleoenvironmental reconstruction of plovers lake, Gauteng Province, South Africa, J. Archaeol. Sci. 41 (2014), pp. 376-388] uses elliptical Fourier analysis of the form (size and shape) of the outline of the occlusal surface of each tooth as features in a linear discriminant analysis (LDA) framework. This manuscript expands on that previous work by exploring how different machine learning approaches classify the teeth and testing which technique is best for classification. In addition to LDA, four other machine learning techniques were considered (neural networks, nuclear penalized multinomial regression,random forests, and support vector machines) with support vector machines and random forests performing the best in terms of log loss and classification rate.
ISSN:0266-4763
1360-0532
DOI:10.1080/02664763.2018.1441381