Synthesis of everyday conversational speech based on fine-tuning with a corpus for speech synthesis

In this letter, we propose a separate modeling of prosodic and segmental features for everyday conversational speech synthesis, addressing challenges posed by low-quality recordings in the Corpus of Everyday Japanese Conversation (CEJC). Initially, the FastSpeech 2 model is trained on the conversati...

Full description

Saved in:

Bibliographic Details
Published in	Acoustical Science and Technology Vol. 46; no. 1; pp. 103 - 105
Main Authors	Mori, Hiroki, Furukawa, Kota
Format	Journal Article
Language	English
Published	Tokyo ACOUSTICAL SOCIETY OF JAPAN 01.01.2025 Japan Science and Technology Agency
Subjects	Conversation Conversational agent Corpus linguistics Everyday conversation Japanese language Linguistics Prosody Speech recognition Speech synthesis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this letter, we propose a separate modeling of prosodic and segmental features for everyday conversational speech synthesis, addressing challenges posed by low-quality recordings in the Corpus of Everyday Japanese Conversation (CEJC). Initially, the FastSpeech 2 model is trained on the conversation corpus and subsequently fine-tuned on a corpus for speech synthesis. Experimental results show that this fine-tuning approach enhances synthesis quality while preserving the nuances of everyday conversations.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1346-3969 1347-5177
DOI:	10.1250/ast.e24.35