Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations
Speech restoration (SR) is a task of converting degraded speech signals into high-quality ones. In this study, we propose a robust SR model called Miipher, and apply Miipher to a new SR application: increasing the amount of high-quality training data for speech generation by converting speech sample...
Saved in:
Published in | 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) pp. 1 - 5 |
---|---|
Main Authors | , , , , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
22.10.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Speech restoration (SR) is a task of converting degraded speech signals into high-quality ones. In this study, we propose a robust SR model called Miipher, and apply Miipher to a new SR application: increasing the amount of high-quality training data for speech generation by converting speech samples collected from the Web to studio-quality. To make our SR model robust against various degradation, we use (i) a speech representation extracted from w2v-BERT for the input feature, and (ii) a text representation extracted from transcripts via PnG-BERT as a linguistic conditioning feature. Experiments show that Miipher (i) is robust against various audio degradation and (ii) enable us to train a high-quality text-to-speech (TTS) model from restored speech samples collected from the Web. Audio samples are available at our demo page: google.github.io/df-conformer/miipher/. |
---|---|
ISSN: | 1947-1629 |
DOI: | 10.1109/WASPAA58266.2023.10248089 |