The Academia Sinica Systems of Voice Conversion for VCC2020
This paper describes the Academia Sinica systems for the two tasks of Voice Conversion Challenge 2020, namely voice conversion within the same language (Task 1) and cross-lingual voice conversion (Task 2). For both tasks, we followed the cascaded ASR+TTS structure, using phonetic tokens as the TTS i...
Saved in:
Main Authors | , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
06.10.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This paper describes the Academia Sinica systems for the two tasks of Voice
Conversion Challenge 2020, namely voice conversion within the same language
(Task 1) and cross-lingual voice conversion (Task 2). For both tasks, we
followed the cascaded ASR+TTS structure, using phonetic tokens as the TTS input
instead of the text or characters. For Task 1, we used the international
phonetic alphabet (IPA) as the input of the TTS model. For Task 2, we used
unsupervised phonetic symbols extracted by the vector-quantized variational
autoencoder (VQVAE). In the evaluation, the listening test showed that our
systems performed well in the VCC2020 challenge. |
---|---|
DOI: | 10.48550/arxiv.2010.02669 |