Practical Training Approaches for Discordant Atopic Dermatitis Severity Datasets: Merging Methods With Soft-Label and Train-Set Pruning

Objective assessment of atopic dermatitis (AD) is essential for choosing proper management strategies. This study investigated the performance of convolutional neural networks (CNN) models in grading the severity of AD. Five board-certified dermatologists independently evaluated the severity of 9,19...

Full description

Saved in:

Bibliographic Details
Published in	IEEE journal of biomedical and health informatics Vol. 27; no. 1; pp. 166 - 175
Main Authors	Cho, Soo Ick, Lee, Dongheon, Han, Byeol, Lee, Ji Su, Hong, Ji Yeon, Chung, Jin Ho, Lee, Dong Hun, Na, Jung-Im
Format	Journal Article
Language	English
Published	United States IEEE 01.01.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Artificial neural networks Atopic dermatitis Bioinformatics Biological system modeling Convolutional neural networks Datasets Dermatitis Dermatitis, Atopic Dermatology discordance Eczema Hospitals Humans Immunoglobulin A investigator's global assessment Merging Neural networks Neural Networks, Computer Pruning soft-label Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Objective assessment of atopic dermatitis (AD) is essential for choosing proper management strategies. This study investigated the performance of convolutional neural networks (CNN) models in grading the severity of AD. Five board-certified dermatologists independently evaluated the severity of 9,192 AD images. The severity of AD was evaluated based on an Investigator's Global Assessment (IGA) and six signs of AD. For CNN training, we applied three distinct approaches: 1) ensemble vs. integration 2) hard-label vs. soft-label and 3) train-set pruning. For the IGA prediction, the two best models were chosen based on the macro-averaged AUROC and F-1 score. The ensemble-soft-label-pruning model was chosen based on AUROC 0.943, 0.927 for the internal and external validation set respectively, and integration-soft-label-whole dataset model was chosen based on the F1-score 0.750, 0.721 for the internal and external validation set respectively. CNN models trained by multi-evaluator dataset outperformed the models by an individual evaluator dataset, and they performed better to the dataset in which the assessment of dermatologists was concordant. In conclusion, CNN models for AD could be improved by labeled dataset from multiple evaluators, merging methods with soft-label and train-set pruning.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2168-2194 2168-2208
DOI:	10.1109/JBHI.2022.3218166