Discriminative Speaker Representation Via Contrastive Learning with Class-Aware Attention in Angular Space

The challenges in applying contrastive learning to speaker verification (SV) are that the softmax-based contrastive loss lacks discriminative power and that the hard negative pairs can easily influence learning. To overcome the first challenge, we propose a contrastive learning SV framework incorpor...

Full description

Saved in:

Bibliographic Details
Published in	ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 1 - 5
Main Authors	Li, Zhe, Mak, Man-Wai, Meng, Helen Mei-Ling
Format	Conference Proceeding
Language	English
Published	IEEE 04.06.2023
Subjects	Acoustics additive angular margin Additives attention mechanism contrastive learning multiobjective optimization Optimization methods Signal processing Speaker verification Speech processing Task analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The challenges in applying contrastive learning to speaker verification (SV) are that the softmax-based contrastive loss lacks discriminative power and that the hard negative pairs can easily influence learning. To overcome the first challenge, we propose a contrastive learning SV framework incorporating an additive angular margin into the supervised contrastive loss in which the margin improves the speaker representation's discrimination ability. For the second challenge, we introduce a class-aware attention mechanism through which hard negative samples contribute less significantly to the supervised contrastive loss. We also employed gradient-based multi-objective optimization to balance the classification and contrastive loss. Experimental results on CN-Celeb and Voxceleb1 show that this new learning objective can cause the encoder to find an embedding space that exhibits great speaker discrimination across languages.
ISSN:	2379-190X
DOI:	10.1109/ICASSP49357.2023.10096230