Semi-Supervised Training of Transformer and Causal Dilated Convolution Network with Applications to Speech Topic Classification

Aiming at the audio event recognition problem of speech recognition, a decision fusion method based on the Transformer and Causal Dilated Convolutional Network (TCDCN) framework is proposed. This method can adjust the model sound events for a long time and capture the time correlation, and can effec...

Full description

Saved in:

Bibliographic Details
Published in	Applied sciences Vol. 11; no. 12; p. 5712
Main Authors	Zeng, Jinxiang, Zhang, Du, Li, Zhiyi, Li, Xiaolin
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.06.2021
Subjects	Accuracy Acoustics Audio data automatic speech recognition Classification Convolution Deep learning Neural networks Noise semi-supervised learning semi-supervised training Speech Speech recognition topic classification Transformer and Causal Dilated Convolution Network Voice recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Aiming at the audio event recognition problem of speech recognition, a decision fusion method based on the Transformer and Causal Dilated Convolutional Network (TCDCN) framework is proposed. This method can adjust the model sound events for a long time and capture the time correlation, and can effectively deal with the sparsity of audio data. At the same time, our dataset comes from audio clips cropped by YouTube. In order to reliably and stably identify audio topics, we extract different features and different loss function calculation methods to find the best model solution. The experimental results from different test models show that the TCDCN model proposed in this paper achieves better recognition results than the classification using neural networks and other fusion methods.
ISSN:	2076-3417 2076-3417
DOI:	10.3390/app11125712