Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling

In this paper, the improvements in the recently implemented Kannada speech recognition system is demonstrated in detail. The Kannada automatic speech recognition (ASR) system consists of ASR models which are created by using Kaldi, IVRS call flow and weather and agricultural commodity prices informa...

Full description

Saved in:

Bibliographic Details
Published in	International journal of speech technology Vol. 23; no. 1; pp. 149 - 167
Main Authors	Thimmaraja Yadava, G., Jayanna, H. S.
Format	Journal Article
Language	English
Published	New York Springer US 01.03.2020 Springer Nature B.V
Subjects	Acoustic noise Acoustics Agricultural commodities Algorithms Artificial Intelligence Automatic speech recognition Background noise Deep learning Engineering Error analysis Feature extraction Kannada language Modelling Neural networks Noise Noise reduction Performance enhancement Pricing Probabilistic models Product development Signal,Image and Speech Processing Social Sciences Speech Speech recognition Subtraction Voice activity detectors Voice recognition Weather Minimum mean square error spectrum power estimator based on zero crossing (MMSE-SPZC) Speech recognition Maximum a Posteriori (MAP) Speech Automatic speech recognition (ASR) Spectral subtraction with voice activity detection (SS-VAD) Minimum mean square error spectrum power (MMSE-SP) Interactive voice response system (IVRS)
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper, the improvements in the recently implemented Kannada speech recognition system is demonstrated in detail. The Kannada automatic speech recognition (ASR) system consists of ASR models which are created by using Kaldi, IVRS call flow and weather and agricultural commodity prices information databases. The task specific speech data used in the recently developed spoken dialogue system had high level of different background noises. The different types of noises present in collected speech data had an adverse effect on the on line and off line speech recognition performances. Therefore, to improve the speech recognition accuracy in Kannada ASR system, a noise reduction algorithm is developed which is a fusion of spectral subtraction with voice activity detection (SS-VAD) and minimum mean square error spectrum power estimator based on zero crossing (MMSE-SPZC) estimator. The noise elimination algorithm is added in the system before the feature extraction part. An alternative ASR models are created using subspace Gaussian mixture models (SGMM) and deep neural network (DNN) modeling techniques. The experimental results show that, the fusion of noise elimination technique and SGMM/DNN based modeling gives a better relative improvement of 7.68% accuracy compared to the recently developed GMM-HMM based ASR system. The least word error rate (WER) acoustic models could be used in spoken dialogue system. The developed spoken query system is tested from Karnataka farmers under uncontrolled environment.
ISSN:	1381-2416 1572-8110
DOI:	10.1007/s10772-020-09671-5