Speech enhancement based on nonnegative matrix factorization in constant-Q frequency domain

•We propose to use constant Q transform to perform speech enhancement.•We use NMF and SMNF methods to perform speech enhancement.•Our proposed CQT method gives high resolution for the low frequencies.•Our proposed CQT method shows better enhancement ability, especially in low SNR.•Comparing with the...

Full description

Saved in:

Bibliographic Details
Published in	Applied acoustics Vol. 174; p. 107732
Main Authors	Xu, Longting, Wei, Zhilin, Zaidi, Syed Faham Ali, Ren, Bo, Yang, Jichen
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.03.2021
Subjects	Additive noise Constant-Q transform NMF SNMF Spectrogram Speech enhancement Additive noise Speech enhancement SNMF NMF Spectrogram Constant-Q transform
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•We propose to use constant Q transform to perform speech enhancement.•We use NMF and SMNF methods to perform speech enhancement.•Our proposed CQT method gives high resolution for the low frequencies.•Our proposed CQT method shows better enhancement ability, especially in low SNR.•Comparing with the NMF algorithm, the enhanced effect of SNMF algorithm is better. The utterance can be easily affected by additive noise in a real environment. To decrease the additive noise, the noisy speech can be enhanced based on the spectrogram following with Nonnegative Matrix Factorization (NMF) and sparse NMF(SNMF) algorithm. More information can be obtained at a high sampling rate. The range of objective human vocal organs is limited to a low-frequency value compared to the high sampling rate; thus, higher resolution is required to describe the low frequencies. Traditional spectrogram based on short-time Fourier transform (STFT) may lack frequency resolution at lower frequencies. To this end, we propose to use a constant Q transform (CQT) in this paper, which can give high resolution for the low frequencies. The backend algorithm remains the NMF/SNMF algorithm. We evaluate the proposed method with the Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility (STOI). The experimental results show that our proposed method shows better enhancement ability compared to the STFT baseline at low Signal to Noise Ratio (SNR).
ISSN:	0003-682X
DOI:	10.1016/j.apacoust.2020.107732