Eigenvector-Based Speech Mask Estimation for Multi-Channel Speech Enhancement

We present the Eigennet architecture for estimating a gain mask from noisy, multi-channel microphone observations. While existing mask estimators use magnitude features, our system also exploits the spatial information embedded in the phase of the data. The mask is used to obtain the Minimum Varianc...

Full description

Saved in:

Bibliographic Details
Published in	IEEE/ACM transactions on audio, speech, and language processing Vol. 27; no. 12; pp. 2162 - 2172
Main Authors	Pfeifenberger, Lukas, Zohrer, Matthias, Pernkopf, Franz
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.12.2019 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Acoustics Array signal processing Beamforming Direction-of-arrival estimation Eigenvalues Eigenvector beamforming Eigenvectors Embedded systems Microphones Multi-channel speech enhancement Neural Networks Noise measurement Reference systems Signal to noise ratio Spatial data Speech enhancement speech mask estimation Speech processing Speech recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We present the Eigennet architecture for estimating a gain mask from noisy, multi-channel microphone observations. While existing mask estimators use magnitude features, our system also exploits the spatial information embedded in the phase of the data. The mask is used to obtain the Minimum Variance Distortionless Response (MVDR) and Generalized Eigenvalue (GEV) beamformers. We also derive the Phase Aware Normalization (PAN) postfilter, which corrects both magnitude and phase distortions caused by the GEV. Further, we demonstrate the properties of our eigenvector features, and compare their performance with three state-of-the-art reference systems. We report their performance in terms of SNR improvement and Word Error Rate (WER) using Google and Kaldi Speech-to-Text API. Experiments are performed on the WSJ0 and CHiME4 corpora, where competitive performance in both WER and SNR is achieved.
ISSN:	2329-9290 2329-9304
DOI:	10.1109/TASLP.2019.2941592