The USTC-Nercslip Systems for the ICMC-ASR Challenge

This report describes the submitted system to the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) challenge, which considers the ASR task with multi-speaker overlapping and Mandarin accent dynamics in the ICMC case. We implement the front-end speaker diarization using the self-supervise...

Full description

Saved in:

Bibliographic Details
Published in	2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) pp. 3 - 4
Main Authors	Wu, Minghui, Xu, Luzhen, Zhang, Jie, Tang, Haitao, Yue, Yanyan, Liao, Ruizhi, Zhao, Jintao, Zhang, Zhengzhe, Wang, Yichi, Yan, Haoyin, Yu, Hongliang, Ma, Tongle, Liu, Jiachen, Wu, Chongliang, Li, Yongchao, Zhang, Yanyong, Fang, Xin, Zhang, Yue
Format	Conference Proceeding
Language	English
Published	IEEE 14.04.2024
Subjects	Acoustics Array signal processing Conferences Data models ICMC-ASR challenge Iterative methods Linguistics multi-channel beamforming pseudo-label Self-supervised learning speaker diarization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This report describes the submitted system to the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) challenge, which considers the ASR task with multi-speaker overlapping and Mandarin accent dynamics in the ICMC case. We implement the front-end speaker diarization using the self-supervised learning representation based multi-speaker embedding and beamforming using the speaker position, respectively. For ASR, we employ an iterative pseudo-label generation method based on fusion model to obtain text labels of unsupervised data. To mitigate the impact of accent, an Accent-ASR framework is proposed, which captures pronunciation-related accent features at a fine-grained level and linguistic information at a coarse-grained level. On the ICMC-ASR eval set, the proposed system achieves a CER of 13.16% on track 1 and a cpCER of 21.48% on track 2, which significantly outperforms the official baseline system and obtains the first rank on both tracks.
DOI:	10.1109/ICASSPW62465.2024.10627164