Classification of imbalanced oral cancer image data from high-risk population

Significance: Early detection of oral cancer is vital for high-risk patients, and machine learning-based automatic classification is ideal for disease screening. However, current datasets collected from high-risk populations are unbalanced and often have detrimental effects on the performance of cla...

Full description

Saved in:

Bibliographic Details
Published in	Journal of biomedical optics Vol. 26; no. 10; p. 105001
Main Authors	Song, Bofan, Li, Shaobai, Sunny, Sumsum, Gurushanth, Keerthi, Mendonca, Pramila, Mukhia, Nirza, Patrick, Sanjana, Gurudath, Shubha, Raghavan, Subhashini, Tsusennaro, Imchen, Leivon, Shirley T, Kolur, Trupti, Shetty, Vivek, Bushan, Vidya, Ramesh, Rohan, Peterson, Tyler, Pillai, Vijay, Wilder-Smith, Petra, Sigamani, Alben, Suresh, Amritha, Kuriakose, Moni Abraham, Birur, Praveen, Liang, Rongguang
Format	Journal Article
Language	English
Published	Bellingham Society of Photo-Optical Instrumentation Engineers 01.10.2021 S P I E - International Society for
Subjects	Algorithms Bias Breast cancer Business metrics Cancer Cancer screening Cheek Classification Data augmentation Datasets Deep learning Entropy Image classification Machine learning Medical imaging Medical research Medical screening Neural networks Oral cancer Populations Risk Risk groups Unbalance White light ensemble learning deep learning imbalanced multi-class datasets mobile screening device oral cancer
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Significance: Early detection of oral cancer is vital for high-risk patients, and machine learning-based automatic classification is ideal for disease screening. However, current datasets collected from high-risk populations are unbalanced and often have detrimental effects on the performance of classification. Aim: To reduce the class bias caused by data imbalance. Approach: We collected 3851 polarized white light cheek mucosa images using our customized oral cancer screening device. We use weight balancing, data augmentation, undersampling, focal loss, and ensemble methods to improve the neural network performance of oral cancer image classification with the imbalanced multi-class datasets captured from high-risk populations during oral cancer screening in low-resource settings. Results: By applying both data-level and algorithm-level approaches to the deep learning training process, the performance of the minority classes, which were difficult to distinguish at the beginning, has been improved. The accuracy of “premalignancy” class is also increased, which is ideal for screening applications. Conclusions: Experimental results show that the class bias induced by imbalanced oral cancer image datasets could be reduced using both data- and algorithm-level methods. Our study may provide an important basis for helping understand the influence of unbalanced datasets on oral cancer deep learning classifiers and how to mitigate.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1083-3668 1560-2281
DOI:	10.1117/1.JBO.26.10.105001