Cosine metric learning based speaker verification

•It proposes two cosine metric learning (CML) back-end algorithms. The first one, named m-CML, aims to enlarge the between-class distance with a regularization term to control the within-class variance. The second one, named v-CML, attempts to reduce the within-class variance with a regularization t...

Full description

Saved in:
Bibliographic Details
Published inSpeech communication Vol. 118; pp. 10 - 20
Main Authors Bai, Zhongxin, Zhang, Xiao-Lei, Chen, Jingdong
Format Journal Article
LanguageEnglish
Published Amsterdam Elsevier B.V 01.04.2020
Elsevier Science Ltd
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•It proposes two cosine metric learning (CML) back-end algorithms. The first one, named m-CML, aims to enlarge the between-class distance with a regularization term to control the within-class variance. The second one, named v-CML, attempts to reduce the within-class variance with a regularization term to prevent the between-class distance from getting smaller.•It combines m-CML with an i-vector front-end since m-CML is good at enlarging the between-class distance of Gaussian score distributions.•It combines v-CML with a d-vector or x-vector front-end as v-CML is able to reduce the within-class variance of heavy-tailed score distributions significantly.•Experimental results on the NIST and SITW speaker recognition evaluation corpora with both i-vector, d-vector and x-vector front-ends demonstrate the effectiveness of the proposed algorithms. The performance of speaker verification depends on the overlap region of the decision scores of true and imposter trials. Motivated by the fact that the overlap region can be reduced by maximizing the between-class distance while minimizing the within-class variance of the trials, we present in this paper two cosine metric learning (CML) back-end algorithms. The first one, named m-CML, aims to enlarge the between-class distance with a regularization term to control the within-class variance. The second one, named v-CML, attempts to reduce the within-class variance with a regularization term to prevent the between-class distance from getting smaller. The regularization terms in the CML methods can be initialized by a traditional channel compensation method, e.g., the linear discriminant analysis. These two algorithms are combined with front-end processing for speaker verification. To validate their effectiveness, m-CML is combined with an i-vector front-end since it is good at enlarging the between-class distance of Gaussian score distributions while v-CML is combined with a d-vector or x-vector front-end as it is able to reduce the within-class variance of heavy-tailed score distributions significantly. Experimental results on the NIST and SITW speaker recognition evaluation corpora show that the proposed algorithms outperform their initialization channel compensation methods, and are competitive to the probabilistic linear discriminant analysis back-end in terms of performance. For comparison, we also applied the m-CML and v-CML methods to the i-vector and x-vector front-ends.
ISSN:0167-6393
1872-7182
DOI:10.1016/j.specom.2020.02.003