VocDoc, what happened to my voice? Towards automatically capturing vocal fatigue in the wild
Voice problems that arise during everyday vocal use can hardly be captured by standard outpatient voice assessments. In preparation for a digital health application to automatically assess longitudinal voice data ‘in the wild’ – the VocDoc, the aim of this paper was to study vocal fatigue from the s...
Saved in:
Published in | Biomedical signal processing and control Vol. 88; p. 105595 |
---|---|
Main Authors | , , , , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
01.02.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Voice problems that arise during everyday vocal use can hardly be captured by standard outpatient voice assessments. In preparation for a digital health application to automatically assess longitudinal voice data ‘in the wild’ – the VocDoc, the aim of this paper was to study vocal fatigue from the speaker’s perspective, the healthcare professional’s perspective, and the ‘machine’s’ perspective.
We collected data of four voice healthy speakers completing a 90-min reading task. Every 10 min the speakers were asked about subjective voice characteristics. Then, we elaborated on the task of elapsed speaking time recognition: We carried out listening experiments with speech and language therapists and employed random forests on the basis of extracted acoustic features. We validated our models speaker-dependently and speaker-independently and analysed underlying feature importances. For an additional, clinical application-oriented scenario, we extended our dataset for lecture recordings of another two speakers.
Self- and expert-assessments were not consistent. With mean F1 scores up to 0.78, automatic elapsed speaking time recognition worked reliably in the speaker-dependent scenario only. A small set of acoustic features – other than features previously reported to reflect vocal fatigue – was found to universally describe long-term variations of the voice.
Vocal fatigue seems to have individual effects across different speakers. Machine learning has the potential to automatically detect and characterise vocal changes over time.
Our study provides technical underpinnings for a future mobile solution to objectively capture pathological long-term voice variations in everyday life settings and make them clinically accessible.
•A few acoustic features seem to universally describe vocal fatigue.•Vocal fatigue has rather individual effects across different speakers.•Machine learning has the potential to automatically detect effects of vocal fatigue.•A mobile app can capture clinically relevant long-term voice variations in the wild. |
---|---|
ISSN: | 1746-8094 |
DOI: | 10.1016/j.bspc.2023.105595 |