VocDoc, what happened to my voice? Towards automatically capturing vocal fatigue in the wild

Voice problems that arise during everyday vocal use can hardly be captured by standard outpatient voice assessments. In preparation for a digital health application to automatically assess longitudinal voice data ‘in the wild’ – the VocDoc, the aim of this paper was to study vocal fatigue from the s...

Full description

Saved in:
Bibliographic Details
Published inBiomedical signal processing and control Vol. 88; p. 105595
Main Authors Pokorny, Florian B., Linke, Julian, Seddiki, Nico, Lohrmann, Simon, Gerstenberger, Claus, Haspl, Katja, Feiner, Marlies, Eyben, Florian, Hagmüller, Martin, Schuppler, Barbara, Kubin, Gernot, Gugatschka, Markus
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.02.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Voice problems that arise during everyday vocal use can hardly be captured by standard outpatient voice assessments. In preparation for a digital health application to automatically assess longitudinal voice data ‘in the wild’ – the VocDoc, the aim of this paper was to study vocal fatigue from the speaker’s perspective, the healthcare professional’s perspective, and the ‘machine’s’ perspective. We collected data of four voice healthy speakers completing a 90-min reading task. Every 10 min the speakers were asked about subjective voice characteristics. Then, we elaborated on the task of elapsed speaking time recognition: We carried out listening experiments with speech and language therapists and employed random forests on the basis of extracted acoustic features. We validated our models speaker-dependently and speaker-independently and analysed underlying feature importances. For an additional, clinical application-oriented scenario, we extended our dataset for lecture recordings of another two speakers. Self- and expert-assessments were not consistent. With mean F1 scores up to 0.78, automatic elapsed speaking time recognition worked reliably in the speaker-dependent scenario only. A small set of acoustic features – other than features previously reported to reflect vocal fatigue – was found to universally describe long-term variations of the voice. Vocal fatigue seems to have individual effects across different speakers. Machine learning has the potential to automatically detect and characterise vocal changes over time. Our study provides technical underpinnings for a future mobile solution to objectively capture pathological long-term voice variations in everyday life settings and make them clinically accessible. •A few acoustic features seem to universally describe vocal fatigue.•Vocal fatigue has rather individual effects across different speakers.•Machine learning has the potential to automatically detect effects of vocal fatigue.•A mobile app can capture clinically relevant long-term voice variations in the wild.
ISSN:1746-8094
DOI:10.1016/j.bspc.2023.105595