Continuous Kannada Speech Recognition System Under Degraded Condition

In this paper, a continuous Kannada speech recognition system is developed under different noisy conditions. The continuous Kannada speech sentences are collected from 2400 speakers across different dialect regions of Karnataka state (a state in the southwestern region of India where Kannada is the...

Full description

Saved in:

Bibliographic Details
Published in	Circuits, systems, and signal processing Vol. 39; no. 1; pp. 391 - 419
Main Authors	Praveen Kumar, P. S., Thimmaraja Yadava, G., Jayanna, H. S.
Format	Journal Article
Language	English
Published	New York Springer US 2020 Springer Nature B.V
Subjects	Acoustic noise Artificial neural networks Automatic speech recognition Circuits and Systems Deep learning Electrical Engineering Electronics and Microelectronics Engineering Error analysis Instrumentation Interactive systems Kannada language Markov analysis Markov chains Modelling Phonemes Probabilistic models Regional dialects Sentences Signal,Image and Speech Processing Speech Speech recognition Transcription Transliteration Voice recognition Continuous speech data Automatic speech recognition (ASR) Kannada language Word error rate (WER) Kaldi tool kit
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper, a continuous Kannada speech recognition system is developed under different noisy conditions. The continuous Kannada speech sentences are collected from 2400 speakers across different dialect regions of Karnataka state (a state in the southwestern region of India where Kannada is the principal language). The word-level transcription and validation of speech data are done by using Indic transliteration tool (IT3:UTF-8). The Kaldi toolkit is used for the development of automatic speech recognition (ASR) models at different phoneme levels. The lexicon and phoneme set are created afresh for continuous Kannada speech sentences. The 80% and 20% of validated speech data are used for system training and testing using Kaldi. The performance of the system is verified by the parameter called word error rate (WER). The acoustic models were built using the techniques such as monophone, triphone1, triphone2, triphone3, subspace Gaussian mixture models (SGMM), combination of deep neural network (DNN) and hidden Markov model (HMM), combination of DNN and SGMM and combination of SGMM and maximum mutual information. The experiment is conducted to determine the WER using different modeling techniques. The results show that the recognition rate obtained through the combination of DNN and HMM outperforms over conventional-based ASR modeling techniques. An interactive voice response system is developed to build an end-to-end ASR system to recognize continuous Kannada speech sentences. The developed ASR system is tested by 300 speakers of Karnataka state under uncontrolled environment.
ISSN:	0278-081X 1531-5878
DOI:	10.1007/s00034-019-01189-9