Feature Based Adaptation for Speaking Style Synthesis

Speaking style plays an important role in the expressivity of speech for communication. Hence speaking style is very important for synthetic speech as well. Speaking style adaptation faces the difficulty that the data of specific styles may be limited and difficult to obtain in large amounts. A poss...

Full description

Saved in:
Bibliographic Details
Published in2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 5304 - 5308
Main Authors Wu, Xixin, Sun, Lifa, Kang, Shiyin, Liu, Songxiang, Wu, Zhiyong, Liu, Xunying, Meng, Helen
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.04.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Speaking style plays an important role in the expressivity of speech for communication. Hence speaking style is very important for synthetic speech as well. Speaking style adaptation faces the difficulty that the data of specific styles may be limited and difficult to obtain in large amounts. A possible solution is to leverage data from speaking styles that are more available, to train the speech synthesizer and then adapt it to the target style for which the data is scarce. Conventional DNN adaptation approaches directly update the top layers of a well-trained, style-dependent model towards the target style. The detailed local context-level mismatch between the original and the target styles is not considered. In order to address this issue, two frame-level input feature-based style adaptation techniques are investigated in this paper. We will use style features extracted from (1) a target-style data trained bottleneck DNN, and (2) a novel cross-style residual feature regression DNN. These features are used for top-layer adaptation of a well-trained style-dependent synthesis network. Experimental results on adapting the declarative sty le to the interrogative sty le demonstrate the effectiveness of our proposed style features in improving the expressiveness of synthesizing speech for the interrogative style, while maintaining speech quality.
ISSN:2379-190X
DOI:10.1109/ICASSP.2018.8462178