Blind Sound Source Separation by Combining the Convolutional Neural Network and Degree Separator

The objective of blind sound source separation is to separate and extract distinct audio sources from a mixture of audio signals with little to no prior information about the mixing process an innovative two-stage approach is presented in this research paper that addresses the challenge of blind sou...

Full description

Saved in:
Bibliographic Details
Published inTraitement du signal Vol. 41; no. 3; pp. 1429 - 1439
Main Authors Mali, Swapnil G., Mahajan, Shrinivas P.
Format Journal Article
LanguageEnglish
French
Published Edmonton International Information and Engineering Technology Association (IIETA) 01.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The objective of blind sound source separation is to separate and extract distinct audio sources from a mixture of audio signals with little to no prior information about the mixing process an innovative two-stage approach is presented in this research paper that addresses the challenge of blind sound source mixing within multichannel sound recordings. The paper proposes a two-stage method that combines a Convolutional Neural Network (CNN) and a degree separator to solve the problem of blind sound source mixing in a multichannel sound recording. The first stage uses CNN to estimate each sound source's Direction of Arrival (DOA) in each time frame. The second stage consists of a degree separator that separates the target source from multiple sources by converting the signal from convolutional to the linear domain. The effectiveness of the proposed method is extensively evaluated using a range of sound sources, including recordings of real-world audio databases created using simulated and actual room impulse responses The estimated DOA of each source is compared against the ground truth trajectory of each source within the complex, multi-sourced environment. The degree separator evaluation is based on Blind Source Separation (BSS) evaluation criteria compared to Fast Independent Component Analysis (FICA). Source separation performance is evaluated using multiple sound sources in simulated and room impulse response recording. The proposed method is evaluated by separation quality parameters such as the image-to-spatial distortion ratio (ISR), signal-to-interference ratio (SIR), and signal-to-artifact ratio (SAR). The proposed method is evaluated using both simulated sound sources and real room impulse response recordings. This research presents a powerful solution for estimating DOA of multiple sound sources and effectively separating them in multichannel sound recordings. Based on comprehensive evaluations performed on stationary and moving source in simulated and actual room condition. The proposed method surpasses conventional BSS approaches regarding separation quality by combining CNN-DOA with a degree separator.
ISSN:0765-0019
1958-5608
DOI:10.18280/ts.410331