Deep clustering: Discriminative embeddings for segmentation and separation
We address the problem of "cocktail-party" source separation in a deep learning framework called deep clustering. Previous deep network approaches to separation have shown promising performance in scenarios with a fixed number of sources, each belonging to a distinct signal class, such as...
Saved in:
Published in | 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 31 - 35 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding Journal Article |
Language | English |
Published |
IEEE
01.03.2016
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We address the problem of "cocktail-party" source separation in a deep learning framework called deep clustering. Previous deep network approaches to separation have shown promising performance in scenarios with a fixed number of sources, each belonging to a distinct signal class, such as speech and noise. However, for arbitrary source classes and number, "class-based" methods are not suitable. Instead, we train a deep network to assign contrastive embedding vectors to each time-frequency region of the spectrogram in order to implicitly predict the segmentation labels of the target spectrogram from the input mixtures. This yields a deep network-based analogue to spectral clustering, in that the embeddings form a low-rank pair-wise affinity matrix that approximates the ideal affinity matrix, while enabling much faster performance. At test time, the clustering step "decodes" the segmentation implicit in the embeddings by optimizing K-means with respect to the unknown assignments. Preliminary experiments on single-channel mixtures from multiple speakers show that a speaker-independent model trained on two-speaker mixtures can improve signal quality for mixtures of held-out speakers by an average of 6dB. More dramatically, the same model does surprisingly well with three-speaker mixtures. |
---|---|
Bibliography: | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Conference-1 ObjectType-Feature-3 content type line 23 SourceType-Conference Papers & Proceedings-2 |
ISSN: | 2379-190X |
DOI: | 10.1109/ICASSP.2016.7471631 |