Synthesis of everyday conversational speech based on fine-tuning with a corpus for speech synthesis

In this letter, we propose a separate modeling of prosodic and segmental features for everyday conversational speech synthesis, addressing challenges posed by low-quality recordings in the Corpus of Everyday Japanese Conversation (CEJC). Initially, the FastSpeech 2 model is trained on the conversati...

Full description

Saved in:
Bibliographic Details
Published inAcoustical Science and Technology Vol. 46; no. 1; pp. 103 - 105
Main Authors Mori, Hiroki, Furukawa, Kota
Format Journal Article
LanguageEnglish
Published Tokyo ACOUSTICAL SOCIETY OF JAPAN 01.01.2025
Japan Science and Technology Agency
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this letter, we propose a separate modeling of prosodic and segmental features for everyday conversational speech synthesis, addressing challenges posed by low-quality recordings in the Corpus of Everyday Japanese Conversation (CEJC). Initially, the FastSpeech 2 model is trained on the conversation corpus and subsequently fine-tuned on a corpus for speech synthesis. Experimental results show that this fine-tuning approach enhances synthesis quality while preserving the nuances of everyday conversations.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1346-3969
1347-5177
DOI:10.1250/ast.e24.35