Separation Structure Speech Separation Model Fusing Conformer and NBC

Abstract The quality of speech separation affects the entire speech technology ecosystem. Aiming at the problems of low utilization of local feature information, insufficient convergence speed, too many calculation parameters and too long calculation time in blind source separation in view of Transf...

Full description

Saved in:
Bibliographic Details
Published inJournal of physics. Conference series Vol. 2384; no. 1; pp. 12034 - 12040
Main Authors Qiang, Zhou, Xiangyang, Cheng, Tian, Ding
Format Journal Article
LanguageEnglish
Published Bristol IOP Publishing 01.12.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Abstract The quality of speech separation affects the entire speech technology ecosystem. Aiming at the problems of low utilization of local feature information, insufficient convergence speed, too many calculation parameters and too long calculation time in blind source separation in view of Transformer dual-path cyclic neural network, a blind source separation model based on fusion of Conformer and NBC (Narrow-band Conformer NBC) is proposed. First, for the problems of low utilization of speech local feature information and insufficient convergence speed, the Transformer block is replaced by the Conformer in the block of the dual-path recurrent network. It can improve the utilization and convergence speed of local features. Secondly, the NBC block is used to replace the Transformer block in the inter-block loop. The NBC block simplifies the calculation of vector similarity and vector aggregation, reduces lots of parameters and calculation cost, and reduces the computational complexity of the model. In the experiments on WSJ0-2mix [7] dataset and WHAM [13] dataset, contrast with other models, the convergence speech is faster and the blind source separation effect is better.
ISSN:1742-6588
1742-6596
DOI:10.1088/1742-6596/2384/1/012034