Comparison of an audio-based and a video-based approach for detecting diplophonia

•Diplophonia is investigated by means of audio and laryngeal high-speed video.•The audio analysis is excellent in separating diplophonia from modal phonation.•The video analysis outperforms the audio analysis in separating diplophonia from dysphonic phonation.•Glottal diplophonia and auditive diplop...

Full description

Saved in:
Bibliographic Details
Published inBiomedical signal processing and control Vol. 31; pp. 576 - 585
Main Authors Aichinger, Philipp, Roesner, Imme, Leonhard, Matthias, Schneider-Stickler, Berit, Denk-Linnert, Doris-Maria, Bigenzahn, Wolfgang, Fuchs, Anna Katharina, Hagmüller, Martin, Kubin, Gernot
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.01.2017
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•Diplophonia is investigated by means of audio and laryngeal high-speed video.•The audio analysis is excellent in separating diplophonia from modal phonation.•The video analysis outperforms the audio analysis in separating diplophonia from dysphonic phonation.•Glottal diplophonia and auditive diplophonia should be distinguished. Diplophonia is a common symptom in voice disorders. Depending on the underlying aetiology, diplophonic patients typically need treatment such as phonosurgery or speech therapy. In current clinical practice, the presence of diplophonia is assessed by auditive rating. To avoid subjectivity in voice assessment and to follow principles of evidence based medicine, objective instrumental assessment methods are needed. In order to gain insight into instrumental assessment of diplophonic voice, comparisons between different assessment approaches are necessary. The aim of the study is to compare the performance of two independent objective approaches on their ability to detect diplophonia. The compared approaches are the formerly published degree of subharmonics (DSH), and a newly proposed measure for spatial bimodality of the vocal fold vibration. From a clinical database of 352 laryngeal high-speed videos with synchronous audio recordings, 60 phonation segments (20 euphonic, twenty diplophonic and twenty non-diplophonic dysphonic) were auditively selected. For all phonation segments, the DSH and the newly proposed measure for spatial bimodality were determined. The DSH is the occurrence rate of audio analysis blocks with ambiguous fundamental frequency in percent. The bimodality measure quantifies the spatial occurrence of secondary oscillation frequencies along the vocal folds’ edges. Both the DSH and the bimodality measure are evaluated on their ability to detect diplophonia by means of cut off threshold classification. The DSH showed excellent classification rates for separating diplophonic from euphonic phonation (sensitivity: 98.4%, specificity: 100%). In separating diplophonic from non-diplophonic dysphonic phonation, the bimodality measure slightly outperforms the DSH approach (sensitivity: 54.6%, specificity: 92.7%). The separation of diplophonia from other kinds of dysphonia is challenging, and more sophisticated methods are needed. It is concluded that auditive and glottal diplophonia must be distinguished. As the clinical assessment of diplophonia primarily aims at determining glottal conditions, the video-based approach might deliver clinically more relevant data than the auditive approach.
ISSN:1746-8094
1746-8108
DOI:10.1016/j.bspc.2014.10.001