A Lightweight CNN-Conformer Model for Automatic Speaker Verification
Recently, Conformer has achieved tremendous success in speaker verification task. It demonstrates that Transformer-based model can achieve remarkable performance in this domain, bypassing the need for intricate pre-training procedures. However, its special macaron-style feed-forward module introduce...
Saved in:
Published in | IEEE signal processing letters Vol. 31; pp. 1 - 5 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.01.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Recently, Conformer has achieved tremendous success in speaker verification task. It demonstrates that Transformer-based model can achieve remarkable performance in this domain, bypassing the need for intricate pre-training procedures. However, its special macaron-style feed-forward module introduced prohibitive computing and memory overhead. Speaker verification is often applied in resource-constrained embedded environments like smartphones, where only low memory is available. In light of this, we proposed two approaches to compress the size of the Conformer-based system while maintaining its performance. First, we introduced a lightweight Convolutional Neural Network (CNN) front-end with channel-frequency attention to substitute shallow Conformer blocks. This substitution is aimed at extracting more informative speaker characteristics for subsequent processing. Secondly, we introduced a light Feed-forward Network (FFN) based on depth-wise separable convolution to decrease the model size of Conformer blocks. To better demonstrate the effectiveness of our model, we conducted the evaluation in three different test sets. By incorporating these two approaches, we achieved an Equal Error Rate (EER) of 0.61% on VoxCeleb-O, surpassing the previous state-of-the-art Transformer-based model MFA-Conformer. Moreover, our model has achieved a 60.6% reduction in parameters and a 36.8% reduction in FLOPs compared with MFA-Conformer. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1070-9908 1558-2361 |
DOI: | 10.1109/LSP.2023.3342714 |