An evaluation study of modulation-domain wavelet denoising method by alleviating different sub-band portions for speech enhancement

In this study, we investigate and extend the capability of the method of modulation-domain wavelet denoising (ModWD) in speech enhancement primarily analyzing the unequal importance of different sub-band signals. The recently developed ModWD is shown to improve the speech quality in adverse noise en...

Full description

Saved in:

Bibliographic Details
Published in	2019 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-TW) pp. 1 - 2
Main Authors	Lin, Jian-Yu, Chen, Yan-Tong, Liu, Kuan-Yi, Hung, Jeih-Weih
Format	Conference Proceeding
Language	English
Published	IEEE 01.05.2019
Subjects	noise reduction spectrogram speech enhancement temporal processing wavelet packet decomposition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this study, we investigate and extend the capability of the method of modulation-domain wavelet denoising (ModWD) in speech enhancement primarily analyzing the unequal importance of different sub-band signals. The recently developed ModWD is shown to improve the speech quality in adverse noise environment by processing the magnitude spectrogram of a noisy speech signal with a one-level discrete wavelet transform (DWT) and then alleviating the obtained detailed portion, which is shown more vulnerable to noise. This study follows the idea of ModWD and use a wavelet packet decomposition (WPD) to decompose the magnitude spectral time series into four sub-band sequences at first. Then any of these four subband sequences is zeroed out while the other three ones are kept unchanged. Finally, these four sub-band sequences are used to construct the updated spectrogram. The main purpose of the aforementioned procedure is to evaluate the noise-robust capability of the magnitude series at different sub-bands which possess twice (modulation) frequency resolution compared with those used in ModWD. The presented method is conducted on a subset of the Aurora-2 connected digit database, and the speech quality evaluation results in terms of Perceptual Evaluation of Speech Quality (PESQ) scores reveal that diminishing the second highest frequency band (roughly within the range [25 Hz, 37.5 Hz]) gives rise to the optimal performance.
DOI:	10.1109/ICCE-TW46550.2019.8991839