A CONTROLLABLE CROSS-GENDER VOICE CONVERSION FOR SOCIAL ROBOT

In this study, we propose a conversion intensity controllable model for voice conversion (VC) 1. . In particular, we combine the CycleGAN and transformer module, and build a condition embedding network as a control parameter. The model is first pre-trained with self-supervised learning on the voice...

Full description

Saved in:
Bibliographic Details
Published in2022 10th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW) pp. 1 - 4
Main Authors Fu, Changzeng, Liu, Chaoran, Ishi, Carlos Toshinori, Ishiguro, Hiroshi
Format Conference Proceeding
LanguageEnglish
Japanese
Published IEEE 18.10.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this study, we propose a conversion intensity controllable model for voice conversion (VC) 1. . In particular, we combine the CycleGAN and transformer module, and build a condition embedding network as a control parameter. The model is first pre-trained with self-supervised learning on the voice reconstruction task, with the condition set to male-to-male or female-to-female. Then, we retrain the model on the cross-gender voice conversion task after the pretraining is completed, with the condition set to male-to-female or female-to-male. In the testing procedure, the condition is expected to be employed as a controllable parameter (scale). The proposed method was evaluated on the Voice Conversion Challenge dataset and compared to two baselines (CycleGAN, CycleTransGAN) with objective and subjective evaluations. The results show that our proposed model is able to convert voice with competitive performance, with the additional function of cross-gender controllability.
DOI:10.1109/ACIIW57231.2022.10086038