A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning

Aphasia is a language disorder that affects the speaking ability of millions of patients. This paper presents a new benchmark for Aphasia speech recognition and detection tasks using state-of-the-art speech recognition techniques with the AphsiaBank dataset. Specifically, we introduce two multi-task...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Tang, Jiyang, Chen, William, Chang, Xuankai, Watanabe, Shinji, MacWhinney, Brian
Format	Paper Journal Article
Language	English
Published	Ithaca Cornell University Library, arXiv.org 19.05.2023
Subjects	Benchmarks Computer Science - Computation and Language Learning Speech processing Speech recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Aphasia is a language disorder that affects the speaking ability of millions of patients. This paper presents a new benchmark for Aphasia speech recognition and detection tasks using state-of-the-art speech recognition techniques with the AphsiaBank dataset. Specifically, we introduce two multi-task learning methods based on the CTC/Attention architecture to perform both tasks simultaneously. Our system achieves state-of-the-art speaker-level detection accuracy (97.3%), and a relative WER reduction of 11% for moderate Aphasia patients. In addition, we demonstrate the generalizability of our approach by applying it to another disordered speech database, the DementiaBank Pitt corpus. We will make our all-in-one recipes and pre-trained model publicly available to facilitate reproducibility. Our standardized data preprocessing pipeline and open-source recipes enable researchers to compare results directly, promoting progress in disordered speech processing.
ISSN:	2331-8422
DOI:	10.48550/arxiv.2305.13331