A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning

Aphasia is a language disorder that affects the speaking ability of millions of patients. This paper presents a new benchmark for Aphasia speech recognition and detection tasks using state-of-the-art speech recognition techniques with the AphsiaBank dataset. Specifically, we introduce two multi-task...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Tang, Jiyang, Chen, William, Chang, Xuankai, Watanabe, Shinji, MacWhinney, Brian
Format Paper Journal Article
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 19.05.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Aphasia is a language disorder that affects the speaking ability of millions of patients. This paper presents a new benchmark for Aphasia speech recognition and detection tasks using state-of-the-art speech recognition techniques with the AphsiaBank dataset. Specifically, we introduce two multi-task learning methods based on the CTC/Attention architecture to perform both tasks simultaneously. Our system achieves state-of-the-art speaker-level detection accuracy (97.3%), and a relative WER reduction of 11% for moderate Aphasia patients. In addition, we demonstrate the generalizability of our approach by applying it to another disordered speech database, the DementiaBank Pitt corpus. We will make our all-in-one recipes and pre-trained model publicly available to facilitate reproducibility. Our standardized data preprocessing pipeline and open-source recipes enable researchers to compare results directly, promoting progress in disordered speech processing.
ISSN:2331-8422
DOI:10.48550/arxiv.2305.13331