후두음성의학에서 딥러닝을 이용한 음성장애 검출에 관한 예비 연구
Background and Objectives Voice disorders can significantly impact quality of life. This study evaluates the feasibility of using deep learning models to detect voice disorders using an opensource dataset. Materials and Method We utilized the Saarbrücken Voice Database, which contains 1231 voice rec...
Saved in:
Published in | 대한후두음성언어의학회지, 36(1) pp. 5 - 11 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | Korean |
Published |
대한후두음성언어의학회
01.04.2025
|
Subjects | |
Online Access | Get full text |
ISSN | 2508-268X 2508-5603 |
Cover
Summary: | Background and Objectives Voice disorders can significantly impact quality of life. This study evaluates the feasibility of using deep learning models to detect voice disorders using an opensource dataset.
Materials and Method We utilized the Saarbrücken Voice Database, which contains 1231 voice recordings of various pathologies. Datasets were used for training (n=1036) and validation (n=195). Key vocal parameters, including fundamental frequency (F0), formants (F1, F2), harmonics-to-noise ratio, jitter, and shimmer, were analyzed. A convolutional neural network (CNN) was designed to classify voice recordings into normal, vox senilis, and laryngocele. Performance was assessed using precision, recall, F1-score, and accuracy.
Results The CNN model demonstrated high classification performance, with precision, recall, and F1-scores of 1.00 for normal and 0.99 for vox senilis and laryngocele. Accuracy reached 1.00 after 50 epochs and remained stable through 100 epochs. Time-frequency analysis supported the model’s ability to differentiate between classes.
Conclusion This study highlights the potential of deep learning for voice disorder detection, achieving high accuracy and precision. Future research should address dataset diversity and realworld integration for broader clinical adoption. KCI Citation Count: 0 |
---|---|
ISSN: | 2508-268X 2508-5603 |