Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization
In this paper we describe the top-scoring IDLab submission for the text-independent task of the Short-duration Speaker Verification (SdSV) Challenge 2020. The main difficulty of the challenge exists in the large degree of varying phonetic overlap between the potentially cross-lingual trials, along w...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , , |
Format | Paper Journal Article |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
10.08.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | In this paper we describe the top-scoring IDLab submission for the text-independent task of the Short-duration Speaker Verification (SdSV) Challenge 2020. The main difficulty of the challenge exists in the large degree of varying phonetic overlap between the potentially cross-lingual trials, along with the limited availability of in-domain DeepMine Farsi training data. We introduce domain-balanced hard prototype mining to fine-tune the state-of-the-art ECAPA-TDNN x-vector based speaker embedding extractor. The sample mining technique efficiently exploits speaker distances between the speaker prototypes of the popular AAM-softmax loss function to construct challenging training batches that are balanced on the domain-level. To enhance the scoring of cross-lingual trials, we propose a language-dependent s-norm score normalization. The imposter cohort only contains data from the Farsi target-domain which simulates the enrollment data always being Farsi. In case a Gaussian-Backend language model detects the test speaker embedding to contain English, a cross-language compensation offset determined on the AAM-softmax speaker prototypes is subtracted from the maximum expected imposter mean score. A fusion of five systems with minor topological tweaks resulted in a final MinDCF and EER of 0.065 and 1.45% respectively on the SdSVC evaluation set. |
---|---|
AbstractList | In this paper we describe the top-scoring IDLab submission for the text-independent task of the Short-duration Speaker Verification (SdSV) Challenge 2020. The main difficulty of the challenge exists in the large degree of varying phonetic overlap between the potentially cross-lingual trials, along with the limited availability of in-domain DeepMine Farsi training data. We introduce domain-balanced hard prototype mining to fine-tune the state-of-the-art ECAPA-TDNN x-vector based speaker embedding extractor. The sample mining technique efficiently exploits speaker distances between the speaker prototypes of the popular AAM-softmax loss function to construct challenging training batches that are balanced on the domain-level. To enhance the scoring of cross-lingual trials, we propose a language-dependent s-norm score normalization. The imposter cohort only contains data from the Farsi target-domain which simulates the enrollment data always being Farsi. In case a Gaussian-Backend language model detects the test speaker embedding to contain English, a cross-language compensation offset determined on the AAM-softmax speaker prototypes is subtracted from the maximum expected imposter mean score. A fusion of five systems with minor topological tweaks resulted in a final MinDCF and EER of 0.065 and 1.45% respectively on the SdSVC evaluation set. In this paper we describe the top-scoring IDLab submission for the text-independent task of the Short-duration Speaker Verification (SdSV) Challenge 2020. The main difficulty of the challenge exists in the large degree of varying phonetic overlap between the potentially cross-lingual trials, along with the limited availability of in-domain DeepMine Farsi training data. We introduce domain-balanced hard prototype mining to fine-tune the state-of-the-art ECAPA-TDNN x-vector based speaker embedding extractor. The sample mining technique efficiently exploits speaker distances between the speaker prototypes of the popular AAM-softmax loss function to construct challenging training batches that are balanced on the domain-level. To enhance the scoring of cross-lingual trials, we propose a language-dependent s-norm score normalization. The imposter cohort only contains data from the Farsi target-domain which simulates the enrollment data always being Farsi. In case a Gaussian-Backend language model detects the test speaker embedding to contain English, a cross-language compensation offset determined on the AAM-softmax speaker prototypes is subtracted from the maximum expected imposter mean score. A fusion of five systems with minor topological tweaks resulted in a final MinDCF and EER of 0.065 and 1.45% respectively on the SdSVC evaluation set. |
Author | Thienpondt, Jenthe Brecht Desplanques Demuynck, Kris |
Author_xml | – sequence: 1 givenname: Jenthe surname: Thienpondt fullname: Thienpondt, Jenthe – sequence: 2 fullname: Brecht Desplanques – sequence: 3 givenname: Kris surname: Demuynck fullname: Demuynck, Kris |
BackLink | https://doi.org/10.21437/Interspeech.2020-2662$$DView published paper (Access to full text may be restricted) https://doi.org/10.48550/arXiv.2007.07689$$DView paper in arXiv |
BookMark | eNotkMtOwzAURC0EEqX0A1hhiXWKcx07yRLKo0jhIbViGznxTXFJ7eCk0PL1pC2axWxGR6NzRo6ts0jIRcjGUSIEu1Z-Y77HwFg8ZrFM0iMyAM7DIIkATsmobZeMMZAxCMEHZDPxrm2DzNjFWtV01qD6RE_f0ZvKlKozztIf033QO7dSxga3qla2RE2nymv65l3num2D9NnYHkGV1TRTO9YCgzts0Gq0HZ2VziN9cX6lavO7p56Tk0rVLY7-e0jmD_fzyTTIXh-fJjdZoARAwKsk5QxFyjUUSclFyCLBUXKIqkKLSoaFSCHkBVSyEMgLHpVxpeOISYk6jfmQXB6wey15481K-W2-05Pv9fSLq8Oi8e5rjW2XL93a2_5TDlGfGNIE-B9rMGq2 |
ContentType | Paper Journal Article |
Copyright | 2020. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
Copyright_xml | – notice: 2020. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
DBID | 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PIMPY PQEST PQQKQ PQUKI PRINS PTHSS AKY GOX |
DOI | 10.48550/arxiv.2007.07689 |
DatabaseName | ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central SciTech Premium Collection (Proquest) (PQ_SDU_P3) ProQuest Engineering Collection Engineering Database Access via ProQuest (Open Access) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection arXiv Computer Science arXiv.org |
DatabaseTitle | Publicly Available Content Database Engineering Database Technology Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest One Academic Engineering Collection |
DatabaseTitleList | Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository – sequence: 2 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Physics |
EISSN | 2331-8422 |
ExternalDocumentID | 2007_07689 |
Genre | Working Paper/Pre-Print |
GroupedDBID | 8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PIMPY PQEST PQQKQ PQUKI PRINS PTHSS AKY GOX |
ID | FETCH-LOGICAL-a522-3f8930e593d2b8c3510453e6324fbd5f61b59213b2f6b5e3b34c7fd74066ed973 |
IEDL.DBID | GOX |
IngestDate | Mon Jan 08 05:40:55 EST 2024 Thu Oct 10 18:16:15 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a522-3f8930e593d2b8c3510453e6324fbd5f61b59213b2f6b5e3b34c7fd74066ed973 |
OpenAccessLink | https://arxiv.org/abs/2007.07689 |
PQID | 2424272982 |
PQPubID | 2050157 |
ParticipantIDs | arxiv_primary_2007_07689 proquest_journals_2424272982 |
PublicationCentury | 2000 |
PublicationDate | 20200810 |
PublicationDateYYYYMMDD | 2020-08-10 |
PublicationDate_xml | – month: 08 year: 2020 text: 20200810 day: 10 |
PublicationDecade | 2020 |
PublicationPlace | Ithaca |
PublicationPlace_xml | – name: Ithaca |
PublicationTitle | arXiv.org |
PublicationYear | 2020 |
Publisher | Cornell University Library, arXiv.org |
Publisher_xml | – name: Cornell University Library, arXiv.org |
SSID | ssj0002672553 |
Score | 1.7760434 |
SecondaryResourceType | preprint |
Snippet | In this paper we describe the top-scoring IDLab submission for the text-independent task of the Short-duration Speaker Verification (SdSV) Challenge 2020. The... In this paper we describe the top-scoring IDLab submission for the text-independent task of the Short-duration Speaker Verification (SdSV) Challenge 2020. The... |
SourceID | arxiv proquest |
SourceType | Open Access Repository Aggregation Database |
SubjectTerms | Computer Science - Computation and Language Computer Science - Sound Computer simulation Domains Embedding Persian language Prototypes Training Verification |
SummonAdditionalLinks | – databaseName: ProQuest Technology Collection dbid: 8FG link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1NTwIxEG0UYuLNz4Ci6cFrw2673bInE0EkRggJaLiRdtsaI7K4gOHn2ylFDyZed7OXaXc-3sy8h9BNlkc6ZzwilklBEhdzSJYnCYmkim0seWQEbCP3B2nvOXmc8EkA3JZhrHLnE72j1kUOGHkT1hioywRb9HbxSUA1CrqrQUJjH1VjKgTc6lb34QdjoalwGTPbNjM9dVdTlpu3r8Bc6DJtGEP0j_64Yh9fukeoOpQLUx6jPTM_QQd-LDNfnqJNG4IYcQXj61rO8Ghh5Lsp8Yu7NjaAbRiQVNwpPlyJT-5gUDE3GkNDHg_LYlUAxor7XgYCy7nGTwGgJJ0gf7vCI6CyxAPIXmdhLfMMjbv343aPBK0EIjnM41uXd0SGZ0xT1XLGd1UWZwa42K3S3Kax4hmNmaI2VdwwxZJcWC1cOE-NzgQ7R5V5MTc1hDXneayE-1yZJI6ssozblhWRziS1RtRRzVtsutjSYYCQpZh6Y9ZRY2fEafgVltPfg7v4__UlOqRQzHq-2QaqrMq1uXIRf6Wu_bF-AzburCg priority: 102 providerName: ProQuest |
Title | Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization |
URI | https://www.proquest.com/docview/2424272982 https://arxiv.org/abs/2007.07689 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV09T8MwELXasrAgEKACpfLAapHYcZyM9FuIlooW1C2yYxshIKnSFHXit2M7qRgQi4fYXs5W3rvz3TsAbuLUkymhHtKEMxQYzEFxGgTI48LXPqeeYrYaeToLJ8_B_YquGgDua2F4sXv7qvSBxea2Uhg0jDhugibGNmVr_LiqHiedFFe9_ned4Zju059fq8OL0TE4qokevKtO5gQ0VHYKdn0LSsg4gK9bM7lYK_6uCvhiroGug2fQRkbhIP80Ljvq2cTDVEloH9jhvMjL3MZM4dS1dYA8k_ChDjiiQd3OtoQLK00JZ5aNftRllmdgORou-xNU9z5AnNr8em14hKdoTCQWkTGm8ZooUVZbXQtJdegLGmOfCKxDQRURJEiZlszAc6hkzMg5aGV5ptoASkpTXzCzXajA97TQhOpIM0_GHGvFLkDbWSxZV_IWtjElS5wxL0Bnb8SkvtqbxNaTYEPJI3z5_84rcIitY-q0YzugVRZbdW3QuxRd0IxG4y446A1n86euO1AzTr-HPwQun0Q |
link.rule.ids | 228,230,783,787,888,12779,21402,27939,33387,33758,43614,43819 |
linkProvider | Cornell University |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV07T8MwELagFYKNpyhPD6wWSRzHzYREoRRoKyQKYovs2EaIkoS0Rf35-FwXBiTWWFnOyT2-u_s-hM7SPFA5ZQExVHAS25hD0jyOSSBkaELBAs1hG3kwTHpP8d0Le_GA28SPVS59onPUqswBIz-HNYbIZoLt6KL6JKAaBd1VL6GxipoxtYEGNsW7Nz8YS5RwmzHTRTPTUXedi3r-9uWZC22mDWOI7tEfV-ziS3cTNR9EpesttKKLbbTmxjLzyQ6adyCIEVswvs7EGD9WWrzrGj_bz8Z4sA0Dkoqvyg9b4pNLGFTMtcLQkMcPdTktAWPFAycDgUWhcN8DlOTKy99O8SNQWeIhZK9jv5a5i0bd61GnR7xWAhEM5vGNzTsCzVKqItm2xrdVFqMauNiNVMwkoWRpFFIZmUQyTSWNc24Ut-E80SrldA81irLQ-wgrxvJQcvu61HEYGGkoM23DA5WKyGjeQvvOYlm1oMMAIUueOWO20NHSiJn_FSbZ78Ud_H98itZ7o0E_698O7w_RRgSFreOePUKNaT3Txzb6T-WJu-JvXAevCg |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Cross-Lingual+Speaker+Verification+with+Domain-Balanced+Hard+Prototype+Mining+and+Language-Dependent+Score+Normalization&rft.jtitle=arXiv.org&rft.au=Thienpondt%2C+Jenthe&rft.au=Brecht+Desplanques&rft.au=Demuynck%2C+Kris&rft.date=2020-08-10&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422&rft_id=info:doi/10.48550%2Farxiv.2007.07689 |