Blind audio source separation using short+long term AR source models and spectrum matching

Blind audio source separation (BASS) arises in a number of applications in speech and music processing such as speech enhancement, speaker diarization, automated music transcription etc. Generally, BASS methods consider multichannel signal capture. The single microphone case is the most difficult un...

Full description

Saved in:

Bibliographic Details
Published in	2011 Digital Signal Processing and Signal Processing Education Meeting (DSP/SPE) pp. 112 - 115
Main Authors	Schutz, A, Slock, D
Format	Conference Proceeding
Language	English
Published	IEEE 01.01.2011
Subjects	Correlation Estimation Frequency estimation Hidden Markov models Minimization Source separation Speech
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Blind audio source separation (BASS) arises in a number of applications in speech and music processing such as speech enhancement, speaker diarization, automated music transcription etc. Generally, BASS methods consider multichannel signal capture. The single microphone case is the most difficult underdetermined case, but it often arises in practice. In the approach considered here, the main source identifiability comes from exploiting the presumed quasi-periodic nature of the sources via long-term autoregressive (AR) modeling. Indeed, musical note signals are quasi-periodic and so is voiced speech, which constitutes the most energetic part of speech signals. We furthermore exploit (e.g. speaker or instrument related) prior information in the spectral envelope of the source signals via short-term AR modeling. We present an iterative method based on the minimization of the (weighted) Itakura-Saito distance for estimating the source parameters directly from the mixture using frame based processing.
ISBN:	1612842267 9781612842264
DOI:	10.1109/DSP-SPE.2011.5739196