Mixing Synthetic and Recorded Signals for Audio-Book Generation

Using TTS systems helps to reduce the cost of audio-book generation. This paper investigates the idea of mixing synthetic and recorded natural speech signals to control the trade-off between the overall quality of audio book and its production cost. Firstly, fully synthetic signals and mixed synthet...

Full description

Saved in:

Bibliographic Details
Published in	Speech and Computer Vol. 12335; pp. 479 - 489
Main Authors	Shamsi, Meysam, Barbot, Nelly, Lolive, Damien, Chevelu, Jonathan
Format	Book Chapter
Language	English
Published	Switzerland Springer International Publishing AG 2020 Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	Audio-book generation Quality evaluation Text-to-Speech
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Using TTS systems helps to reduce the cost of audio-book generation. This paper investigates the idea of mixing synthetic and recorded natural speech signals to control the trade-off between the overall quality of audio book and its production cost. Firstly, fully synthetic signals and mixed synthetic and natural signals are compared perceptually using different levels of synthetic quality. The listeners’ perception shows that mixed signals are preferred. Next, the order and configuration of mixed signals are studied. The perceptual test does not show any significant difference between the different configurations. Finally, the synthetic quality and the bias of a starting and ending part of mixed signals in perceptual test are investigated.
ISBN:	3030602753 9783030602758
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-030-60276-5_46