SCNet: Sparse Compression Network for Music Source Separation

Deep learning-based methods have made significant achievements in music source separation. However, obtaining good results while maintaining a low model complexity remains challenging in super wide-band music source separation. Previous works either overlook the differences in subbands or inadequate...

Full description

Saved in:

Bibliographic Details
Published in	ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 1276 - 1280
Main Authors	Tong, Weinan, Zhu, Jiaxu, Chen, Jun, Kang, Shiyin, Jiang, Tao, Li, Yang, Wu, Zhiyong, Meng, Helen
Format	Conference Proceeding
Language	English
Published	IEEE 14.04.2024
Subjects	Computational modeling Data models frequency domain Frequency-domain analysis Learning systems Multiple signal classification Music separation Network architecture Source separation sparse compression subband
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Deep learning-based methods have made significant achievements in music source separation. However, obtaining good results while maintaining a low model complexity remains challenging in super wide-band music source separation. Previous works either overlook the differences in subbands or inadequately address the problem of information loss when generating subband features. In this paper, we propose SCNet, a novel frequency-domain network to explicitly split the spectrogram of the mixture into several subbands and introduce a sparsity-based encoder to model different frequency bands. We use a higher compression ratio on subbands with less information to improve the information density and focus on modeling subbands with more information. In this way, the separation performance can be significantly improved using lower computational consumption. Experiment results show that the proposed model achieves a signal to distortion ratio (SDR) of 9.0 dB on the MUSDB18-HQ dataset without using extra data, which outperforms state-of-the-art methods. Specifically, SCNet's CPU inference time is only 48% of HT Demucs, one of the previous state-of-the-art models.
ISSN:	2379-190X
DOI:	10.1109/ICASSP48485.2024.10446651