Label noise and self-learning label correction in cardiac abnormalities classification
Abstract Objective . Learning to classify cardiac abnormalities requires large and high-quality labeled datasets, which is a challenge in medical applications. Small datasets from various sources are often aggregated to meet this requirement, resulting in a final dataset prone to label noise due to...
Saved in:
Published in | Physiological measurement Vol. 43; no. 9; pp. 94001 - 94012 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
IOP Publishing
30.09.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Abstract
Objective
. Learning to classify cardiac abnormalities requires large and high-quality labeled datasets, which is a challenge in medical applications. Small datasets from various sources are often aggregated to meet this requirement, resulting in a final dataset prone to label noise due to inter- and intra-observer variability and different expertise. It is well known that label noise can affect the performance and generalizability of the trained models. In this work, we explore the impact of label noise and self-learning label correction on the classification of cardiac abnormalities on large heterogeneous datasets of electrocardiogram (ECG) signals.
Approach.
A state-of-the-art self-learning multi-class label correction method for image classification is adapted to learn a multi-label classifier for electrocardiogram signals. We evaluated our performance using 5-fold cross-validation on the publicly available PhysioNet/Computing in Cardiology (CinC) 2021 Challenge data, with full and reduced sets of leads. Due to the unknown label noise in the testing set, we tested our approach on the MNIST dataset. We investigated the performance under different levels of structured label noise for both datasets.
Main results.
Under high levels of noise, the cross-validation results of self-learning label correction show an improvement of approximately 3% in the challenge score for the PhysioNet/CinC 2021 Challenge dataset and an improvement in accuracy of 5% and reduction of the expected calibration error of 0.03 for the MNIST dataset. We demonstrate that self-learning label correction can be used to effectively deal with the presence of unknown label noise, also when using a reduced number of ECG leads. |
---|---|
Bibliography: | PMEA-104558.R5 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0967-3334 1361-6579 |
DOI: | 10.1088/1361-6579/ac89cb |