SPEECH RECOGNITION USING UNSPOKEN TEXT AND SPEECH SYNTHESIS

To provide a method for providing speech recognition using unspoken text and speech synthesis.SOLUTION: A method 500 for simultaneously training generative adversarial network (GAN)-based text-to-speech (TTS) models and speech recognition models includes the steps of: acquiring a plurality of traini...

Full description

Saved in:

Bibliographic Details
Main Authors	PEDRO J MORENO MENGIBAR, CHEN ZHEHUAI, BHUVANA RAMABHADRAN, ANDREW ROSENBERG
Format	Patent
Language	English Japanese
Published	10.04.2024
Subjects	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online Access	Get full text

Cover

Loading…

More Information
Summary:	To provide a method for providing speech recognition using unspoken text and speech synthesis.SOLUTION: A method 500 for simultaneously training generative adversarial network (GAN)-based text-to-speech (TTS) models and speech recognition models includes the steps of: acquiring a plurality of training text utterances; generating, for output by a GAN-based TTS model, a synthetic speech representation of corresponding training text utterances; determining, using an adversarial discriminator, an adversarial loss term indicating an amount of acoustic noise imbalance in a non-synthetic speech representation relative to a corresponding synthetic speech representation of the corresponding training text utterances; and updating parameters of the GAN-based TTS model parameters based on the adversarial loss term.SELECTED DRAWING: Figure 5 【課題】非発話テキストおよび音声合成を使う音声認識を提供する方法を提供する。【解決手段】敵対的生成ネットワーク（ＧＡＮ）ベースのテキスト音声（ＴＴＳ）モデル及び音声認識モデルを一斉にトレーニングするための方法５００であって、複数のトレーニング用テキスト発声を取得するステップと、ＧＡＮベースのＴＴＳモデルによる出力のために、対応するトレーニング用テキスト発声の合成音声表現を生成するステップと、敵対的弁別器を使って、対応するトレーニング用テキスト発声の対応する合成音声表現に相対した、非合成音声表現における音響ノイズ不均衡の量を示す敵対的損失項を判断するステップと、敵対的損失項に基づいて、ＧＡＮベースのＴＴＳモデルのパラメータを更新するステップと、含む。【選択図】図５
Bibliography:	Application Number: JP20240017453