A new method to control error rates in automated species identification with deep learning algorithms

Processing data from surveys using photos or videos remains a major bottleneck in ecology. Deep Learning Algorithms (DLAs) have been increasingly used to automatically identify organisms on images. However, despite recent advances, it remains difficult to control the error rate of such methods. Here...

Full description

Saved in:

Bibliographic Details
Published in	Scientific reports Vol. 10; no. 1; p. 10972
Main Authors	Villon, Sébastien, Mouillot, David, Chaumont, Marc, Subsol, Gérard, Claverie, Thomas, Villéger, Sébastien
Format	Journal Article
Language	English
Published	London Nature Publishing Group UK 03.07.2020 Nature Publishing Group
Subjects	704/158 704/158/670 704/158/672 704/158/853 Algorithms Biodiversity Computer Science Coral reefs Deep learning Graphics Humanities and Social Sciences Learning algorithms multidisciplinary Risk reduction Science Science (multidisciplinary) Species
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Processing data from surveys using photos or videos remains a major bottleneck in ecology. Deep Learning Algorithms (DLAs) have been increasingly used to automatically identify organisms on images. However, despite recent advances, it remains difficult to control the error rate of such methods. Here, we proposed a new framework to control the error rate of DLAs. More precisely, for each species, a confidence threshold was automatically computed using a training dataset independent from the one used to train the DLAs. These species-specific thresholds were then used to post-process the outputs of the DLAs, assigning classification scores to each class for a given image including a new class called “unsure”. We applied this framework to a study case identifying 20 fish species from 13,232 underwater images on coral reefs. The overall rate of species misclassification decreased from 22% with the raw DLAs to 2.98% after post-processing using the thresholds defined to minimize the risk of misclassification. This new framework has the potential to unclog the bottleneck of information extraction from massive digital data while ensuring a high level of accuracy in biodiversity assessment.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2045-2322 2045-2322
DOI:	10.1038/s41598-020-67573-7