Cross-domain speaker recognition using domain adversarial siamese network with a domain discriminator

With the widespread use of automatic speaker recognition in realistic world, it suffers a lot when there is a domain mismatch, including channel, language, distance etc. Recent research studies have introduced the adversarial-learning mechanism into deep neural networks to reduce the distribution mi...

Full description

Saved in:

Bibliographic Details
Published in	Electronics letters Vol. 56; no. 14; pp. 737 - 739
Main Authors	Chen, Zhigao, Miao, Xiaoxiao, Xiao, Runqiu, Wang, Wenchao
Format	Journal Article
Language	English
Published	The Institution of Engineering and Technology 09.07.2020
Subjects	adversarial‐learning mechanism AISHELL‐Wake‐Up‐1 data set automatic speaker recognition background training cross‐channel data cross‐domain speaker recognition deep neural networks distribution mismatch domain adversarial methods domain adversarial siamese network domain consistence domain discriminator domain distributions domain influence domain mismatch domain‐invariant evaluation data Gaussian processes learning (artificial intelligence) neural nets NIST speaker recognition evaluation speaker recognition speaker‐discriminative Speech and audio processing and translation test data unknown domain unknown domain background training test data domain adversarial siamese network domain consistence domain discriminator evaluation data cross-domain speaker recognition domain-invariant learning (artificial intelligence) deep neural networks cross-channel data domain influence domain adversarial methods automatic speaker recognition NIST speaker recognition evaluation speaker-discriminative AISHELL-Wake-Up-1 data set adversarial-learning mechanism domain mismatch Gaussian processes domain distributions neural nets speaker recognition distribution mismatch
Online Access	Get full text

Cover

Loading…

More Information
Summary:	With the widespread use of automatic speaker recognition in realistic world, it suffers a lot when there is a domain mismatch, including channel, language, distance etc. Recent research studies have introduced the adversarial-learning mechanism into deep neural networks to reduce the distribution mismatch between different domains. However, they only aligned the domain distributions between the background training and evaluation data. Few focused on the diverse distribution underlying the enrol and test data. In this Letter, the authors propose a domain adversarial siamese (DAS) network trying to eliminate the domain influence on speech representation. Specifically, they feed a network with speech pairs from the same speaker. Then a domain discriminator is introduced to capture the domain consistence or difference between pairs. Final embeddings become domain-invariant and more speaker-discriminative. A cross-channel data set is sort out from NIST speaker recognition evaluation and more experiments are conducted on AISHELL-Wake-Up-1 data set. Results show that DAS performs equally effective with typical domain adversarial methods, improving results at least $10\%$10% on energy efficiency rating. Furthermore, it is proved to be more valid for scenarios such as unbalanced data amount and unknown domain, achieving relatively $11\%$11% improvements.
ISSN:	0013-5194 1350-911X 1350-911X
DOI:	10.1049/el.2020.0673