Variable-Length Unit Selection in TTS Using Structural Syntactic Cost

This paper presents a variable-length unit selection scheme based on syntactic cost to select text-to-speech (TTS) synthesis units. The syntactic structure of a sentence is derived from a probabilistic context-free grammar (PCFG), and represented as a syntactic vector. The syntactic difference betwe...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on audio, speech, and language processing Vol. 15; no. 4; pp. 1227 - 1235
Main Authors	Chung-Hsien Wu, Hsia, C.-C., Jiun-Fu Chen, Jhing-Fa Wang
Format	Journal Article
Language	English
Published	Piscataway, NJ IEEE 01.05.2007 Institute of Electrical and Electronics Engineers
Subjects	Algorithms Applied sciences Computer science Concatenated codes Costs Dynamic programming Exact sciences and technology Heuristic algorithms Humans Information, signal and communications theory Latent semantic analysis (LSA) Mathematical analysis probabilistic context-free grammar (PCFG) Semantics Signal processing Spatial databases Speech Speech processing Speech recognition Speech synthesis Statistical analysis syntactic structure Synthesizers Telecommunications and information theory variable-length unit selection Vectors (mathematics) Performance evaluation Speech analysis Probabilistic approach Semantic analysis syntactic structure Latent semantic analysis (LSA) Verbal perception Speech synthesis Algorithm Statistical test Linguistic analysis Speech recognition Database Context free grammar Economic aspect Dynamic programming variable-length unit selection Cost analysis Syntax Speech processing probabilistic context-free grammar (PCFG)
Online Access	Get full text
ISSN	1558-7916
DOI	10.1109/TASL.2006.889752

Cover

More Information
Summary:	This paper presents a variable-length unit selection scheme based on syntactic cost to select text-to-speech (TTS) synthesis units. The syntactic structure of a sentence is derived from a probabilistic context-free grammar (PCFG), and represented as a syntactic vector. The syntactic difference between target and candidate units (words or phrases) is estimated by the cosine measure with the inside probability of PCFG acting as a weight. Latent semantic analysis (LSA) is applied to reduce the dimensionality of the syntactic vectors. The dynamic programming algorithm is adopted to obtain a concatenated unit sequence with minimum cost. A syntactic property-rich speech database is designed and collected as the unit inventory. Several experiments with statistical testing are conducted to assess the quality of the synthetic speech as perceived by human subjects. The proposed method outperforms the synthesizer without considering syntactic property. The structural syntax estimates the substitution cost better than the acoustic features alone
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	1558-7916
DOI:	10.1109/TASL.2006.889752