An evaluation of cost functions sensitively capturing local degradation of naturalness for segment selection in concatenative speech synthesis

In this paper, we evaluate various cost functions for selecting a segment sequence in terms of the correspondence between the cost and perceptual scores to the naturalness of synthetic speech. The results demonstrate that the conventional average cost, which shows the degradation of naturalness over...

Full description

Saved in:
Bibliographic Details
Published inSpeech communication Vol. 48; no. 1; pp. 45 - 56
Main Authors Toda, Tomoki, Kawai, Hisashi, Tsuzaki, Minoru, Shikano, Kiyohiro
Format Journal Article
LanguageEnglish
Published Amsterdam Elsevier B.V 2006
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this paper, we evaluate various cost functions for selecting a segment sequence in terms of the correspondence between the cost and perceptual scores to the naturalness of synthetic speech. The results demonstrate that the conventional average cost, which shows the degradation of naturalness over the entire synthetic utterance, has better correspondence to the perceptual scores than the maximum cost, which shows the worst local degradation of naturalness. Furthermore, it is shown that root mean square (RMS) cost, which takes into account both the average cost and the maximum cost, has the best correspondence. We also show that the naturalness of synthetic speech can be improved by using the RMS cost for segment selection. Then, we investigate the effects of applying the RMS cost to segment selection in comparison to those of applying the average cost. Experimental results show that in segment selection based on the RMS cost, a larger number of concatenations causing slight local degradation are performed so that concatenations causing greater local degradation are avoided.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ObjectType-Article-1
ObjectType-Feature-2
ISSN:0167-6393
1872-7182
DOI:10.1016/j.specom.2005.05.011