Single-Channel Speech Separation Using Soft Mask Filtering

We present an approach for separating two speech signals when only one single recording of their linear mixture is available. For this purpose, we derive a filter, which we call the soft mask filter, using minimum mean square error (MMSE) estimation of the log spectral vectors of sources given the m...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on audio, speech, and language processing Vol. 15; no. 8; pp. 2299 - 2310
Main Authors	Radfar, M.H., Dansereau, R.M.
Format	Journal Article
Language	English
Published	Piscataway, NJ IEEE 01.11.2007 Institute of Electrical and Electronics Engineers
Subjects	Applied sciences Detection, estimation, filtering, equalization, prediction Estimation error Exact sciences and technology Filtering Filtration Independent component analysis Information, signal and communications theory Mask filtering Masks Mathematical analysis Mean square error methods minimum mean square error (MMSE) estimation Miscellaneous Separation Signal and communications theory Signal processing Signal, noise single-channel speech separation Sound filters Source separation Spectra Speech Speech coding Speech enhancement Speech processing Telecommunications and information theory Vectors Wiener filter Performance evaluation Signal mixing Parameter estimation Spectral method Source separation Wiener filtering Segmentation Acoustic signal minimum mean square error (MMSE) estimation Signal estimation Modeling single-channel speech separation Mean square error Mask filtering Vocal signal s parameter Speech enhancement Signal processing Speech processing Signal to noise ratio
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We present an approach for separating two speech signals when only one single recording of their linear mixture is available. For this purpose, we derive a filter, which we call the soft mask filter, using minimum mean square error (MMSE) estimation of the log spectral vectors of sources given the mixture's log spectral vectors. The soft mask filter's parameters are estimated using the mean and variance of the underlying sources which are modeled using the Gaussian composite source modeling (CSM) approach. It is also shown that the binary mask filter which has been empirically and extensively used in single-channel speech separation techniques is, in fact, a simplified form of the soft mask filter. The soft mask filtering technique is compared with the binary mask and Wiener filtering approaches when the input consists of male+male, female+female, and male+female mixtures. The experimental results in terms of signal-to-noise ratio (SNR) and segmental SNR show that soft mask filtering outperforms binary mask and Wiener filtering.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	1558-7916 1558-7924
DOI:	10.1109/TASL.2007.904233