Comparison of an audio-based and a video-based approach for detecting diplophonia

•Diplophonia is investigated by means of audio and laryngeal high-speed video.•The audio analysis is excellent in separating diplophonia from modal phonation.•The video analysis outperforms the audio analysis in separating diplophonia from dysphonic phonation.•Glottal diplophonia and auditive diplop...

Full description

Saved in:

Bibliographic Details
Published in	Biomedical signal processing and control Vol. 31; pp. 576 - 585
Main Authors	Aichinger, Philipp, Roesner, Imme, Leonhard, Matthias, Schneider-Stickler, Berit, Denk-Linnert, Doris-Maria, Bigenzahn, Wolfgang, Fuchs, Anna Katharina, Hagmüller, Martin, Kubin, Gernot
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.01.2017
Subjects	Audio signal processing Degree of subharmonics Diplophonia Laryngeal high-speed videos Pathologic voice Video signal processing Pathologic voice Diplophonia Audio signal processing Laryngeal high-speed videos ROC NUM DSH Degree of subharmonics LHSV Video signal processing STP AUC
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•Diplophonia is investigated by means of audio and laryngeal high-speed video.•The audio analysis is excellent in separating diplophonia from modal phonation.•The video analysis outperforms the audio analysis in separating diplophonia from dysphonic phonation.•Glottal diplophonia and auditive diplophonia should be distinguished. Diplophonia is a common symptom in voice disorders. Depending on the underlying aetiology, diplophonic patients typically need treatment such as phonosurgery or speech therapy. In current clinical practice, the presence of diplophonia is assessed by auditive rating. To avoid subjectivity in voice assessment and to follow principles of evidence based medicine, objective instrumental assessment methods are needed. In order to gain insight into instrumental assessment of diplophonic voice, comparisons between different assessment approaches are necessary. The aim of the study is to compare the performance of two independent objective approaches on their ability to detect diplophonia. The compared approaches are the formerly published degree of subharmonics (DSH), and a newly proposed measure for spatial bimodality of the vocal fold vibration. From a clinical database of 352 laryngeal high-speed videos with synchronous audio recordings, 60 phonation segments (20 euphonic, twenty diplophonic and twenty non-diplophonic dysphonic) were auditively selected. For all phonation segments, the DSH and the newly proposed measure for spatial bimodality were determined. The DSH is the occurrence rate of audio analysis blocks with ambiguous fundamental frequency in percent. The bimodality measure quantifies the spatial occurrence of secondary oscillation frequencies along the vocal folds’ edges. Both the DSH and the bimodality measure are evaluated on their ability to detect diplophonia by means of cut off threshold classification. The DSH showed excellent classification rates for separating diplophonic from euphonic phonation (sensitivity: 98.4%, specificity: 100%). In separating diplophonic from non-diplophonic dysphonic phonation, the bimodality measure slightly outperforms the DSH approach (sensitivity: 54.6%, specificity: 92.7%). The separation of diplophonia from other kinds of dysphonia is challenging, and more sophisticated methods are needed. It is concluded that auditive and glottal diplophonia must be distinguished. As the clinical assessment of diplophonia primarily aims at determining glottal conditions, the video-based approach might deliver clinically more relevant data than the auditive approach.
ISSN:	1746-8094 1746-8108
DOI:	10.1016/j.bspc.2014.10.001