Recognition of English speech – using a deep learning algorithm

The accurate recognition of speech is beneficial to the fields of machine translation and intelligent human–computer interaction. After briefly introducing speech recognition algorithms, this study proposed to recognize speech with a recurrent neural network (RNN) and adopted the connectionist tempo...

Full description

Saved in:

Bibliographic Details
Published in	Journal of intelligent systems Vol. 32; no. 1; pp. 225 - 37
Main Author	Wang, Shuyan
Format	Journal Article
Language	English
Published	Berlin De Gruyter 23.02.2023 Walter de Gruyter GmbH
Subjects	68T07 Accuracy Algorithms Artificial neural networks connectionist temporal classification Deep learning English Machine learning Machine translation Markov chains Neural networks Probabilistic models recurrent neural network Recurrent neural networks Speech Speech recognition Testing time Training Voice recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The accurate recognition of speech is beneficial to the fields of machine translation and intelligent human–computer interaction. After briefly introducing speech recognition algorithms, this study proposed to recognize speech with a recurrent neural network (RNN) and adopted the connectionist temporal classification (CTC) algorithm to align input speech sequences and output text sequences forcibly. Simulation experiments compared the RNN-CTC algorithm with the Gaussian mixture model–hidden Markov model and convolutional neural network-CTC algorithms. The results demonstrated that the more training samples the speech recognition algorithm had, the higher the recognition accuracy of the trained algorithm was, but the training time consumption increased gradually; the more samples a trained speech recognition algorithm had to test, the lower the recognition accuracy and the longer the testing time. The proposed RNN-CTC speech recognition algorithm always had the highest accuracy and the lowest training and testing time among the three algorithms when the number of training and testing samples was the same.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2191-026X 0334-1860 2191-026X
DOI:	10.1515/jisys-2022-0236