Synthesis of everyday conversational speech based on fine-tuning with a corpus for speech synthesis
In this letter, we propose a separate modeling of prosodic and segmental features for everyday conversational speech synthesis, addressing challenges posed by low-quality recordings in the Corpus of Everyday Japanese Conversation (CEJC). Initially, the FastSpeech 2 model is trained on the conversati...
Saved in:
Published in | Acoustical Science and Technology Vol. 46; no. 1; pp. 103 - 105 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Tokyo
ACOUSTICAL SOCIETY OF JAPAN
01.01.2025
Japan Science and Technology Agency |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In this letter, we propose a separate modeling of prosodic and segmental features for everyday conversational speech synthesis, addressing challenges posed by low-quality recordings in the Corpus of Everyday Japanese Conversation (CEJC). Initially, the FastSpeech 2 model is trained on the conversation corpus and subsequently fine-tuned on a corpus for speech synthesis. Experimental results show that this fine-tuning approach enhances synthesis quality while preserving the nuances of everyday conversations. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1346-3969 1347-5177 |
DOI: | 10.1250/ast.e24.35 |