Relational Data Selection for Data Augmentation of Speaker-dependent Multi-band MelGAN Vocoder
Nowadays, neural vocoders can generate very high-fidelity speech when a bunch of training data is available. Although a speaker-dependent (SD) vocoder usually outperforms a speaker-independent (SI) vocoder, it is impractical to collect a large amount of data of a specific target speaker for most rea...
Saved in:
Main Authors | , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
10.06.2021
|
Online Access | Get full text |
Cover
Loading…
Abstract | Nowadays, neural vocoders can generate very high-fidelity speech when a bunch
of training data is available. Although a speaker-dependent (SD) vocoder
usually outperforms a speaker-independent (SI) vocoder, it is impractical to
collect a large amount of data of a specific target speaker for most real-world
applications. To tackle the problem of limited target data, a data augmentation
method based on speaker representation and similarity measurement of speaker
verification is proposed in this paper. The proposed method selects utterances
that have similar speaker identity to the target speaker from an external
corpus, and then combines the selected utterances with the limited target data
for SD vocoder adaptation. The evaluation results show that, compared with the
vocoder adapted using only limited target data, the vocoder adapted using
augmented data improves both the quality and similarity of synthesized speech. |
---|---|
AbstractList | Nowadays, neural vocoders can generate very high-fidelity speech when a bunch
of training data is available. Although a speaker-dependent (SD) vocoder
usually outperforms a speaker-independent (SI) vocoder, it is impractical to
collect a large amount of data of a specific target speaker for most real-world
applications. To tackle the problem of limited target data, a data augmentation
method based on speaker representation and similarity measurement of speaker
verification is proposed in this paper. The proposed method selects utterances
that have similar speaker identity to the target speaker from an external
corpus, and then combines the selected utterances with the limited target data
for SD vocoder adaptation. The evaluation results show that, compared with the
vocoder adapted using only limited target data, the vocoder adapted using
augmented data improves both the quality and similarity of synthesized speech. |
Author | Hu, Cheng-Hung Huang, Wen-Chin Wang, Hsin-Min Wu, Yi-Chiao Peng, Yu-Huai Toda, Tomoki Lee, Hung-Shin Tsao, Yu |
Author_xml | – sequence: 1 givenname: Yi-Chiao surname: Wu fullname: Wu, Yi-Chiao – sequence: 2 givenname: Cheng-Hung surname: Hu fullname: Hu, Cheng-Hung – sequence: 3 givenname: Hung-Shin surname: Lee fullname: Lee, Hung-Shin – sequence: 4 givenname: Yu-Huai surname: Peng fullname: Peng, Yu-Huai – sequence: 5 givenname: Wen-Chin surname: Huang fullname: Huang, Wen-Chin – sequence: 6 givenname: Yu surname: Tsao fullname: Tsao, Yu – sequence: 7 givenname: Hsin-Min surname: Wang fullname: Wang, Hsin-Min – sequence: 8 givenname: Tomoki surname: Toda fullname: Toda, Tomoki |
BackLink | https://doi.org/10.48550/arXiv.2106.05629$$DView paper in arXiv |
BookMark | eNotj09PwyAYxjnoQacfwJN8ASowaMuxmTpNNk3c4tHmHbyYRgYNdka__bbO05M8_5LfJTmLKSIhN4IXqtaa30H-7X4KKXhZcF1Kc0E-3jDA0KUIgd7DAHSFAe3RoD7lk9XsPrcYh7FGk6erHuELM3PYY3SHhC53YejYBqKjSwzz5oW-J5sc5ity7iF84_W_Tsj68WE9e2KL1_nzrFkwKCvDDMqNx8oClEJyV1sEYQ1W3BlUuhLeYw3WC2M8yhJVrYxU2hymnitv5XRCbk-3I2Db524L-a89grYj6HQP2d1QjA |
ContentType | Journal Article |
Copyright | http://creativecommons.org/licenses/by/4.0 |
Copyright_xml | – notice: http://creativecommons.org/licenses/by/4.0 |
DBID | GOX |
DOI | 10.48550/arxiv.2106.05629 |
DatabaseName | arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 2106_05629 |
GroupedDBID | GOX |
ID | FETCH-LOGICAL-a679-9e2bfe7caa6120d8cea1c9e70d9e4571ffe8acf199fe26e48492459679f04fc23 |
IEDL.DBID | GOX |
IngestDate | Mon Jan 08 05:45:43 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a679-9e2bfe7caa6120d8cea1c9e70d9e4571ffe8acf199fe26e48492459679f04fc23 |
OpenAccessLink | https://arxiv.org/abs/2106.05629 |
ParticipantIDs | arxiv_primary_2106_05629 |
PublicationCentury | 2000 |
PublicationDate | 2021-06-10 |
PublicationDateYYYYMMDD | 2021-06-10 |
PublicationDate_xml | – month: 06 year: 2021 text: 2021-06-10 day: 10 |
PublicationDecade | 2020 |
PublicationYear | 2021 |
Score | 1.8095697 |
SecondaryResourceType | preprint |
Snippet | Nowadays, neural vocoders can generate very high-fidelity speech when a bunch
of training data is available. Although a speaker-dependent (SD) vocoder
usually... |
SourceID | arxiv |
SourceType | Open Access Repository |
Title | Relational Data Selection for Data Augmentation of Speaker-dependent Multi-band MelGAN Vocoder |
URI | https://arxiv.org/abs/2106.05629 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NT8MwDLW2nbggEKDxqRy4BtosbZbjBGwT0sZhA-3ElKYOQkA3lQ7x83HSTnAht9jJxZHsZ8XPBrhU5ABjIZAL63IubaI4oQTFKdrQJknJAXru8GSajh_l_SJZtIBtuTCm_H79qvsDZ5_XlI-kVz5E6za0hfAlW6OHRf05GVpxNed_zxHGDKI_QWK4B7sNumOD-jn2oYXFATxvC85Ic2sqw2Zh9gwJGCHGWjTYvHw0LKCCrRybrdG8Ycm3M2orFpiyPKPEn03wfTSYsqeV56OXhzAf3s1vxrwZa8BNqjTXKDKHyhpD4CLK-xZNbDWqKNcoExU7h31jXay1Q5Gi7EtKkRJNV10knRW9I-gUqwK7wAhrEjyjZWUubaYMZkqpniQQkxtjxDF0gzGW67pzxdLbaRnsdPK_6hR2hC_c8AN6ojPoVOUGzynyVtlFMP8PUSKDYw |
link.rule.ids | 228,230,781,886 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Relational+Data+Selection+for+Data+Augmentation+of+Speaker-dependent+Multi-band+MelGAN+Vocoder&rft.au=Wu%2C+Yi-Chiao&rft.au=Hu%2C+Cheng-Hung&rft.au=Lee%2C+Hung-Shin&rft.au=Peng%2C+Yu-Huai&rft.date=2021-06-10&rft_id=info:doi/10.48550%2Farxiv.2106.05629&rft.externalDocID=2106_05629 |