후두음성의학에서 딥러닝을 이용한 음성장애 검출에 관한 예비 연구

Background and Objectives Voice disorders can significantly impact quality of life. This study evaluates the feasibility of using deep learning models to detect voice disorders using an opensource dataset. Materials and Method We utilized the Saarbrücken Voice Database, which contains 1231 voice rec...

Full description

Saved in:

Bibliographic Details
Published in	대한후두음성언어의학회지, 36(1) pp. 5 - 11
Main Authors	김광현, 조재근
Format	Journal Article
Language	Korean
Published	대한후두음성언어의학회 01.04.2025
Subjects	이비인후과학
Online Access	Get full text
ISSN	2508-268X 2508-5603

Cover

More Information
Summary:	Background and Objectives Voice disorders can significantly impact quality of life. This study evaluates the feasibility of using deep learning models to detect voice disorders using an opensource dataset. Materials and Method We utilized the Saarbrücken Voice Database, which contains 1231 voice recordings of various pathologies. Datasets were used for training (n=1036) and validation (n=195). Key vocal parameters, including fundamental frequency (F0), formants (F1, F2), harmonics-to-noise ratio, jitter, and shimmer, were analyzed. A convolutional neural network (CNN) was designed to classify voice recordings into normal, vox senilis, and laryngocele. Performance was assessed using precision, recall, F1-score, and accuracy. Results The CNN model demonstrated high classification performance, with precision, recall, and F1-scores of 1.00 for normal and 0.99 for vox senilis and laryngocele. Accuracy reached 1.00 after 50 epochs and remained stable through 100 epochs. Time-frequency analysis supported the model’s ability to differentiate between classes. Conclusion This study highlights the potential of deep learning for voice disorder detection, achieving high accuracy and precision. Future research should address dataset diversity and realworld integration for broader clinical adoption. KCI Citation Count: 0
ISSN:	2508-268X 2508-5603