Relational Data Selection for Data Augmentation of Speaker-dependent Multi-band MelGAN Vocoder

Nowadays, neural vocoders can generate very high-fidelity speech when a bunch of training data is available. Although a speaker-dependent (SD) vocoder usually outperforms a speaker-independent (SI) vocoder, it is impractical to collect a large amount of data of a specific target speaker for most rea...

Full description

Saved in:
Bibliographic Details
Main Authors Wu, Yi-Chiao, Hu, Cheng-Hung, Lee, Hung-Shin, Peng, Yu-Huai, Huang, Wen-Chin, Tsao, Yu, Wang, Hsin-Min, Toda, Tomoki
Format Journal Article
LanguageEnglish
Published 10.06.2021
Online AccessGet full text

Cover

Loading…
Abstract Nowadays, neural vocoders can generate very high-fidelity speech when a bunch of training data is available. Although a speaker-dependent (SD) vocoder usually outperforms a speaker-independent (SI) vocoder, it is impractical to collect a large amount of data of a specific target speaker for most real-world applications. To tackle the problem of limited target data, a data augmentation method based on speaker representation and similarity measurement of speaker verification is proposed in this paper. The proposed method selects utterances that have similar speaker identity to the target speaker from an external corpus, and then combines the selected utterances with the limited target data for SD vocoder adaptation. The evaluation results show that, compared with the vocoder adapted using only limited target data, the vocoder adapted using augmented data improves both the quality and similarity of synthesized speech.
AbstractList Nowadays, neural vocoders can generate very high-fidelity speech when a bunch of training data is available. Although a speaker-dependent (SD) vocoder usually outperforms a speaker-independent (SI) vocoder, it is impractical to collect a large amount of data of a specific target speaker for most real-world applications. To tackle the problem of limited target data, a data augmentation method based on speaker representation and similarity measurement of speaker verification is proposed in this paper. The proposed method selects utterances that have similar speaker identity to the target speaker from an external corpus, and then combines the selected utterances with the limited target data for SD vocoder adaptation. The evaluation results show that, compared with the vocoder adapted using only limited target data, the vocoder adapted using augmented data improves both the quality and similarity of synthesized speech.
Author Hu, Cheng-Hung
Huang, Wen-Chin
Wang, Hsin-Min
Wu, Yi-Chiao
Peng, Yu-Huai
Toda, Tomoki
Lee, Hung-Shin
Tsao, Yu
Author_xml – sequence: 1
  givenname: Yi-Chiao
  surname: Wu
  fullname: Wu, Yi-Chiao
– sequence: 2
  givenname: Cheng-Hung
  surname: Hu
  fullname: Hu, Cheng-Hung
– sequence: 3
  givenname: Hung-Shin
  surname: Lee
  fullname: Lee, Hung-Shin
– sequence: 4
  givenname: Yu-Huai
  surname: Peng
  fullname: Peng, Yu-Huai
– sequence: 5
  givenname: Wen-Chin
  surname: Huang
  fullname: Huang, Wen-Chin
– sequence: 6
  givenname: Yu
  surname: Tsao
  fullname: Tsao, Yu
– sequence: 7
  givenname: Hsin-Min
  surname: Wang
  fullname: Wang, Hsin-Min
– sequence: 8
  givenname: Tomoki
  surname: Toda
  fullname: Toda, Tomoki
BackLink https://doi.org/10.48550/arXiv.2106.05629$$DView paper in arXiv
BookMark eNotj09PwyAYxjnoQacfwJN8ASowaMuxmTpNNk3c4tHmHbyYRgYNdka__bbO05M8_5LfJTmLKSIhN4IXqtaa30H-7X4KKXhZcF1Kc0E-3jDA0KUIgd7DAHSFAe3RoD7lk9XsPrcYh7FGk6erHuELM3PYY3SHhC53YejYBqKjSwzz5oW-J5sc5ity7iF84_W_Tsj68WE9e2KL1_nzrFkwKCvDDMqNx8oClEJyV1sEYQ1W3BlUuhLeYw3WC2M8yhJVrYxU2hymnitv5XRCbk-3I2Db524L-a89grYj6HQP2d1QjA
ContentType Journal Article
Copyright http://creativecommons.org/licenses/by/4.0
Copyright_xml – notice: http://creativecommons.org/licenses/by/4.0
DBID GOX
DOI 10.48550/arxiv.2106.05629
DatabaseName arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2106_05629
GroupedDBID GOX
ID FETCH-LOGICAL-a679-9e2bfe7caa6120d8cea1c9e70d9e4571ffe8acf199fe26e48492459679f04fc23
IEDL.DBID GOX
IngestDate Mon Jan 08 05:45:43 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a679-9e2bfe7caa6120d8cea1c9e70d9e4571ffe8acf199fe26e48492459679f04fc23
OpenAccessLink https://arxiv.org/abs/2106.05629
ParticipantIDs arxiv_primary_2106_05629
PublicationCentury 2000
PublicationDate 2021-06-10
PublicationDateYYYYMMDD 2021-06-10
PublicationDate_xml – month: 06
  year: 2021
  text: 2021-06-10
  day: 10
PublicationDecade 2020
PublicationYear 2021
Score 1.8095697
SecondaryResourceType preprint
Snippet Nowadays, neural vocoders can generate very high-fidelity speech when a bunch of training data is available. Although a speaker-dependent (SD) vocoder usually...
SourceID arxiv
SourceType Open Access Repository
Title Relational Data Selection for Data Augmentation of Speaker-dependent Multi-band MelGAN Vocoder
URI https://arxiv.org/abs/2106.05629
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NT8MwDLW2nbggEKDxqRy4BtosbZbjBGwT0sZhA-3ElKYOQkA3lQ7x83HSTnAht9jJxZHsZ8XPBrhU5ABjIZAL63IubaI4oQTFKdrQJknJAXru8GSajh_l_SJZtIBtuTCm_H79qvsDZ5_XlI-kVz5E6za0hfAlW6OHRf05GVpxNed_zxHGDKI_QWK4B7sNumOD-jn2oYXFATxvC85Ic2sqw2Zh9gwJGCHGWjTYvHw0LKCCrRybrdG8Ycm3M2orFpiyPKPEn03wfTSYsqeV56OXhzAf3s1vxrwZa8BNqjTXKDKHyhpD4CLK-xZNbDWqKNcoExU7h31jXay1Q5Gi7EtKkRJNV10knRW9I-gUqwK7wAhrEjyjZWUubaYMZkqpniQQkxtjxDF0gzGW67pzxdLbaRnsdPK_6hR2hC_c8AN6ojPoVOUGzynyVtlFMP8PUSKDYw
link.rule.ids 228,230,781,886
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Relational+Data+Selection+for+Data+Augmentation+of+Speaker-dependent+Multi-band+MelGAN+Vocoder&rft.au=Wu%2C+Yi-Chiao&rft.au=Hu%2C+Cheng-Hung&rft.au=Lee%2C+Hung-Shin&rft.au=Peng%2C+Yu-Huai&rft.date=2021-06-10&rft_id=info:doi/10.48550%2Farxiv.2106.05629&rft.externalDocID=2106_05629