Accurate image-based identification of macroinvertebrate specimens using deep learning-How much training data is needed?

Image-based methods for species identification offer cost-efficient solutions for biomonitoring. This is particularly relevant for invertebrate studies, where bulk samples often represent insurmountable workloads for sorting, identifying, and counting individual specimens. On the other hand, image-b...

Full description

Saved in:

Bibliographic Details
Published in	PeerJ (San Francisco, CA) Vol. 10; p. e13837
Main Authors	Høye, Toke T, Dyrmann, Mads, Kjær, Christian, Nielsen, Johnny, Bruus, Marianne, Mielec, Cecilie L, Vesterdal, Maria S, Bjerge, Kim, Madsen, Sigurd A, Jeppesen, Mads R, Melvad, Claus
Format	Journal Article
Language	English
Published	United States PeerJ. Ltd 23.08.2022 PeerJ Inc
Subjects	Animals Arthropods Artificial intelligence Biodiversity Biological Monitoring Computational Biology Computer vision Data Mining and Machine Learning Deep Learning Entomology Equipment and supplies Fresh Water Freshwater Biology Freshwater fauna Image processing Imaging systems Invertebrate identification Machine learning Neural networks Neural Networks, Computer Computer vision Entomology Machine learning Invertebrate identification Artificial intelligence Freshwater fauna Monitoring
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Image-based methods for species identification offer cost-efficient solutions for biomonitoring. This is particularly relevant for invertebrate studies, where bulk samples often represent insurmountable workloads for sorting, identifying, and counting individual specimens. On the other hand, image-based classification using deep learning tools have strict requirements for the amount of training data, which is often a limiting factor. Here, we examine how classification accuracy increases with the amount of training data using the BIODISCOVER imaging system constructed for image-based classification and biomass estimation of invertebrate specimens. We use a balanced dataset of 60 specimens of each of 16 taxa of freshwater macroinvertebrates to systematically quantify how classification performance of a convolutional neural network (CNN) increases for individual taxa and the overall community as the number of specimens used for training is increased. We show a striking 99.2% classification accuracy when the CNN (EfficientNet-B6) is trained on 50 specimens of each taxon, and also how the lower classification accuracy of models trained on less data is particularly evident for morphologically similar species placed within the same taxonomic order. Even with as little as 15 specimens used for training, classification accuracy reached 97%. Our results add to a recent body of literature showing the huge potential of image-based methods and deep learning for specimen-based research, and furthermore offers a perspective to future automatized approaches for deriving ecological data from bulk arthropod samples.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2167-8359 2167-8359
DOI:	10.7717/peerj.13837