An End-to-End Speech Accent Recognition Method Based on Hybrid CTC/Attention Transformer ASR

This paper proposes a novel accent recognition system in the framework of a transformer-based end-to-end speech recognition system. To incorporate the pronunciation and linguistic knowledge into the network, we first pre-train an ASR model in a hybrid CTC/attention manner. Then, focusing on accent r...

Full description

Saved in:
Bibliographic Details
Published inProceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) pp. 7253 - 7257
Main Authors Gao, Qiang, Wu, Haiwei, Sun, Yanqing, Duan, Yitao
Format Conference Proceeding
LanguageEnglish
Published IEEE 06.06.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper proposes a novel accent recognition system in the framework of a transformer-based end-to-end speech recognition system. To incorporate the pronunciation and linguistic knowledge into the network, we first pre-train an ASR model in a hybrid CTC/attention manner. Then, focusing on accent recognition, we extend the output token list by inserting accent labels to the transcripts and finetune the network parameters with an accented speech dataset. Our work is evaluated on the Interspeech 2020 Accented English Speech Recognition Challenge. Experiments show that our method achieves an accuracy of 72.39% on the test set and 80.98% on the development set, outperforming the baseline system by a very large margin. Our submitted system ranked second in the accent recognition task in the challenge.
ISSN:2379-190X
DOI:10.1109/ICASSP39728.2021.9414082