Comparison of an audio-based and a video-based approach for detecting diplophonia
•Diplophonia is investigated by means of audio and laryngeal high-speed video.•The audio analysis is excellent in separating diplophonia from modal phonation.•The video analysis outperforms the audio analysis in separating diplophonia from dysphonic phonation.•Glottal diplophonia and auditive diplop...
Saved in:
Published in | Biomedical signal processing and control Vol. 31; pp. 576 - 585 |
---|---|
Main Authors | , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
01.01.2017
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •Diplophonia is investigated by means of audio and laryngeal high-speed video.•The audio analysis is excellent in separating diplophonia from modal phonation.•The video analysis outperforms the audio analysis in separating diplophonia from dysphonic phonation.•Glottal diplophonia and auditive diplophonia should be distinguished.
Diplophonia is a common symptom in voice disorders. Depending on the underlying aetiology, diplophonic patients typically need treatment such as phonosurgery or speech therapy. In current clinical practice, the presence of diplophonia is assessed by auditive rating. To avoid subjectivity in voice assessment and to follow principles of evidence based medicine, objective instrumental assessment methods are needed. In order to gain insight into instrumental assessment of diplophonic voice, comparisons between different assessment approaches are necessary. The aim of the study is to compare the performance of two independent objective approaches on their ability to detect diplophonia. The compared approaches are the formerly published degree of subharmonics (DSH), and a newly proposed measure for spatial bimodality of the vocal fold vibration.
From a clinical database of 352 laryngeal high-speed videos with synchronous audio recordings, 60 phonation segments (20 euphonic, twenty diplophonic and twenty non-diplophonic dysphonic) were auditively selected. For all phonation segments, the DSH and the newly proposed measure for spatial bimodality were determined. The DSH is the occurrence rate of audio analysis blocks with ambiguous fundamental frequency in percent. The bimodality measure quantifies the spatial occurrence of secondary oscillation frequencies along the vocal folds’ edges. Both the DSH and the bimodality measure are evaluated on their ability to detect diplophonia by means of cut off threshold classification.
The DSH showed excellent classification rates for separating diplophonic from euphonic phonation (sensitivity: 98.4%, specificity: 100%). In separating diplophonic from non-diplophonic dysphonic phonation, the bimodality measure slightly outperforms the DSH approach (sensitivity: 54.6%, specificity: 92.7%). The separation of diplophonia from other kinds of dysphonia is challenging, and more sophisticated methods are needed. It is concluded that auditive and glottal diplophonia must be distinguished. As the clinical assessment of diplophonia primarily aims at determining glottal conditions, the video-based approach might deliver clinically more relevant data than the auditive approach. |
---|---|
ISSN: | 1746-8094 1746-8108 |
DOI: | 10.1016/j.bspc.2014.10.001 |