Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language

Communication has been an important aspect of human life, civilization, and globalization for thousands of years. Biometric analysis, education, security, healthcare, and smart cities are only a few examples of speech recognition applications. Most studies have mainly concentrated on English, Spanis...

Full description

Saved in:

Bibliographic Details
Published in	Sensors (Basel, Switzerland) Vol. 22; no. 10; p. 3683
Main Authors	Mukhamadiyev, Abdinabi, Khujayarov, Ilyos, Djuraev, Oybek, Cho, Jinsoo
Format	Journal Article
Language	English
Published	Switzerland MDPI AG 12.05.2022 MDPI
Subjects	Acoustics Attention Automatic speech recognition Chinese languages Computational linguistics convolutional neural network CTC-attention Datasets Deep Learning Dialects end-to-end speech recognition Error analysis Globalization Humans Japanese language Language Language processing Linguistics Markov analysis Markov chains Markov processes Methods Natural language interfaces Neural networks Spanish language Speaking Speech Speech Perception Speech recognition Speech recognition software transformers Uzbek language Voice recognition Uzbekistan United Kingdom deep learning transformers hidden Markov model Uzbek language convolutional neural network end-to-end speech recognition CTC-attention
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Communication has been an important aspect of human life, civilization, and globalization for thousands of years. Biometric analysis, education, security, healthcare, and smart cities are only a few examples of speech recognition applications. Most studies have mainly concentrated on English, Spanish, Japanese, or Chinese, disregarding other low-resource languages, such as Uzbek, leaving their analysis open. In this paper, we propose an End-To-End Deep Neural Network-Hidden Markov Model speech recognition model and a hybrid Connectionist Temporal Classification (CTC)-attention network for the Uzbek language and its dialects. The proposed approach reduces training time and improves speech recognition accuracy by effectively using CTC objective function in attention model training. We evaluated the linguistic and lay-native speaker performances on the Uzbek language dataset, which was collected as a part of this study. Experimental results show that the proposed model achieved a word error rate of 14.3% using 207 h of recordings as an Uzbek language training dataset.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1424-8220 1424-8220
DOI:	10.3390/s22103683