SPEECH RECOGNITION USING UNSPOKEN TEXT AND SPEECH SYNTHESIS

To provide a method for providing speech recognition using unspoken text and speech synthesis.SOLUTION: A method 500 for simultaneously training generative adversarial network (GAN)-based text-to-speech (TTS) models and speech recognition models includes the steps of: acquiring a plurality of traini...

Full description

Saved in:
Bibliographic Details
Main Authors PEDRO J MORENO MENGIBAR, CHEN ZHEHUAI, BHUVANA RAMABHADRAN, ANDREW ROSENBERG
Format Patent
LanguageEnglish
Japanese
Published 10.04.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:To provide a method for providing speech recognition using unspoken text and speech synthesis.SOLUTION: A method 500 for simultaneously training generative adversarial network (GAN)-based text-to-speech (TTS) models and speech recognition models includes the steps of: acquiring a plurality of training text utterances; generating, for output by a GAN-based TTS model, a synthetic speech representation of corresponding training text utterances; determining, using an adversarial discriminator, an adversarial loss term indicating an amount of acoustic noise imbalance in a non-synthetic speech representation relative to a corresponding synthetic speech representation of the corresponding training text utterances; and updating parameters of the GAN-based TTS model parameters based on the adversarial loss term.SELECTED DRAWING: Figure 5 【課題】非発話テキストおよび音声合成を使う音声認識を提供する方法を提供する。【解決手段】敵対的生成ネットワーク(GAN)ベースのテキスト音声(TTS)モデル及び音声認識モデルを一斉にトレーニングするための方法500であって、複数のトレーニング用テキスト発声を取得するステップと、GANベースのTTSモデルによる出力のために、対応するトレーニング用テキスト発声の合成音声表現を生成するステップと、敵対的弁別器を使って、対応するトレーニング用テキスト発声の対応する合成音声表現に相対した、非合成音声表現における音響ノイズ不均衡の量を示す敵対的損失項を判断するステップと、敵対的損失項に基づいて、GANベースのTTSモデルのパラメータを更新するステップと、含む。【選択図】図5
Bibliography:Application Number: JP20240017453