Are fast labeling methods reliable? A case study of computer-aided expert annotations on microscopy slides

Deep-learning-based pipelines have shown the potential to revolutionalize microscopy image diagnostics by providing visual augmentations to a trained pathology expert. However, to match human performance, the methods rely on the availability of vast amounts of high-quality labeled data, which poses...

Full description

Saved in:
Bibliographic Details
Main Authors Marzahl, Christian, Bertram, Christof A, Aubreville, Marc, Petrick, Anne, Weiler, Kristina, Gläsel, Agnes C, Fragoso, Marco, Merz, Sophie, Bartenschlager, Florian, Hoppe, Judith, Langenhagen, Alina, Jasensky, Anne, Voigt, Jörn, Klopfleisch, Robert, Maier, Andreas
Format Journal Article
LanguageEnglish
Published 13.04.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Deep-learning-based pipelines have shown the potential to revolutionalize microscopy image diagnostics by providing visual augmentations to a trained pathology expert. However, to match human performance, the methods rely on the availability of vast amounts of high-quality labeled data, which poses a significant challenge. To circumvent this, augmented labeling methods, also known as expert-algorithm-collaboration, have recently become popular. However, potential biases introduced by this operation mode and their effects for training neuronal networks are not entirely understood. This work aims to shed light on some of the effects by providing a case study for three pathologically relevant diagnostic settings. Ten trained pathology experts performed a labeling tasks first without and later with computer-generated augmentation. To investigate different biasing effects, we intentionally introduced errors to the augmentation. Furthermore, we developed a novel loss function which incorporates the experts' annotation consensus in the training of a deep learning classifier. In total, the pathology experts annotated 26,015 cells on 1,200 images in this novel annotation study. Backed by this extensive data set, we found that the consensus of multiple experts and the deep learning classifier accuracy, was significantly increased in the computer-aided setting, versus the unaided annotation. However, a significant percentage of the deliberately introduced false labels was not identified by the experts. Additionally, we showed that our loss function profited from multiple experts and outperformed conventional loss functions. At the same time, systematic errors did not lead to a deterioration of the trained classifier accuracy. Furthermore, a classifier trained with annotations from a single expert with computer-aided support can outperform the combined annotations from up to nine experts.
DOI:10.48550/arxiv.2004.05838