Foreground Speech Segmentation and Enhancement Using Glottal Closure Instants and Mel Cepstral Coefficients

In this paper, the speech signal recorded from the desired speaker close to microphone in natural environment is regarded as foreground speech and rest of the interfering sources as background noise . The proposed paper exploits speech production features like glottal closure instants in time domain...

Full description

Saved in:

Bibliographic Details
Published in	IEEE/ACM transactions on audio, speech, and language processing Vol. 24; no. 7; pp. 1205 - 1219
Main Authors	Deepak, K. T., Mahadeva Prasanna, S. R.
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.07.2016 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Acoustics Background noise Closures Foreground segmentation glottal closure instants Mathematical models MCC Microphones MLSA Noise measurement Segmentation Signal to noise ratio Spectra Speech Speech disorders Speech enhancement State of the art zero band filter Foreground segmentation zero band filter MLSA glottal closure instants MCC speech enhancement
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper, the speech signal recorded from the desired speaker close to microphone in natural environment is regarded as foreground speech and rest of the interfering sources as background noise . The proposed paper exploits speech production features like glottal closure instants in time domain and vocal tract information in spectral domain to segment the desired speaker's speech and to further enhance it. The foreground speech is perceptually enhanced using the auditory perception feature in mel-frequency domain using mel-cepstral coefficients and its inversion using mel log spectrum approximation filter. The focus is on enhancing the production and perceptual features of foreground speech rather than relying on modeling the interfering sources. The speech data are collected in different natural environments from different speakers in order to evaluate the proposed method. The enhanced speech signals derived at three different stages of the proposed method are evaluated with state-of-the-art methods in terms of subjective and objective measures. The proposed method provides improved performance compared to the considered state-of-the-art methods. In terms of the proposed objective measure foreground to background Ratio, the enhancement approach presented in this paper gives an average improvement of 12 dB as opposed to existing spectral subtraction-based method which provides 3 dB. Moreover, subjective evaluation using 24 different subjects corroborates the objective test results.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2329-9290 2329-9304
DOI:	10.1109/TASLP.2016.2549699