Monaural Music Source Separation Using Convolutional Sparse Coding

We present a comprehensive performance study of a new time-domain approach for estimating the components of an observed monaural audio mixture. Unlike existing time-frequency approaches that use the product of a set of spectral templates and their corresponding activation patterns to approximate the...

Full description

Saved in:

Bibliographic Details
Published in	IEEE/ACM transactions on audio, speech, and language processing Vol. 24; no. 11; pp. 2158 - 2170
Main Authors	Ping-Keng Jao, Li Su, Yi-Hsuan Yang, Wohlberg, Brendt
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.11.2016 The Institute of Electrical and Electronics Engineers, Inc. (IEEE) IEEE - ACM
Subjects	Activation Algorithms Approximation Coding Computer Science Convolution Convolutional codes convolutional sparse coding Convolutional sparse coding (CSC) Dictionaries Information Science Instruments Mathematics MATHEMATICS AND COMPUTING Monaural music source separation multipitch estimation (MPE) Music Musical scores non-negative matrix factorization nonnegative matrix factorization (NMF) phase score-informed source separation Separation Sound filters Source separation Time-domain analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We present a comprehensive performance study of a new time-domain approach for estimating the components of an observed monaural audio mixture. Unlike existing time-frequency approaches that use the product of a set of spectral templates and their corresponding activation patterns to approximate the spectrogram of the mixture, the proposed approach uses the sum of a set of convolutions of estimated activations with prelearned dictionary filters to approximate the audio mixture directly in the time domain. The approximation problem can be solved by an efficient convolutional sparse coding algorithm. The effectiveness of this approach for source separation of musical audio has been demonstrated in our prior work, but under rather restricted and controlled conditions, requiring the musical score of the mixture being informed a priori and little mismatch between the dictionary filters and the source signals. In this paper, we report an evaluation that considers wider, and more practical, experimental settings. This includes the use of an audio-based multipitch estimation algorithm to replace the musical score, and an external dataset of audio single notes to construct the dictionary filters. Our result shows that the proposed approach remains effective with a larger dictionary, and compares favorably with the state-of-the-art nonnegative matrix factorization approach. However, in the absence of the score and in the case of a small dictionary, our approach may not be better.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 USDOE Laboratory Directed Research and Development (LDRD) Program LA-UR-15-27928 89233218CNA000001
ISSN:	2329-9290 2329-9304
DOI:	10.1109/TASLP.2016.2598323