SPEECH RHYTHM CONVERSION DEVICE, MODEL LEARNING DEVICE, METHODS FOR THESE, AND PROGRAM

In order to accurately convert speech rhythms, this model storage unit (10): uses as an input therefor a first feature value vector including information pertaining to the speech rhythm of at least a phoneme extracted from a first audio signal uttered by a speaker from a first group; and stores a sp...

Full description

Saved in:

Bibliographic Details
Main Author	HIROYA, Sadao
Format	Patent
Language	English French Japanese
Published	02.07.2020
Subjects	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In order to accurately convert speech rhythms, this model storage unit (10): uses as an input therefor a first feature value vector including information pertaining to the speech rhythm of at least a phoneme extracted from a first audio signal uttered by a speaker from a first group; and stores a speech rhythm conversion model being a neural network that converts the speech rhythm of the first audio signal to a speech rhythm of a second group speaker and outputs same. A feature value extraction unit (11) extracts, from the input audio signals uttered by a first group speaker, information pertaining to a vocal tract spectrum and information pertaining to speech rhythm. A conversion unit (12): inputs, to the speech rhythm conversion model, the first feature value vector including information pertaining to the speech rhythm extracted from the input audio signals; and obtains a converted speech rhythm. A voice synthesizer (13) uses the converted speech rhythm and information pertaining to the vocal tract spectrum extracted from the input audio signals and generates output audio signals. Afin de convertir avec précision des rythmes de parole, une unité de stockage de modèle (10) : utilise en tant qu'entrée un premier vecteur de valeur caractéristique comprenant des informations relatives au rythme de parole d'au moins un phonème extrait d'un premier signal audio prononcé par un locuteur d'un premier groupe ; et stocke un modèle de conversion de rythme de parole constituant un réseau neuronal, qui convertit le rythme de parole du premier signal audio en un rythme de parole d'un locuteur d'un second groupe et le produit. Une unité d'extraction de valeur caractéristique (11) extrait, à partir des signaux audio d'entrée prononcés par un locuteur du premier groupe, des informations relatives à un spectre de conduit vocal et des informations relatives au rythme de parole. Une unité de conversion (12) : applique à l'entrée du modèle de conversion de rythme de parole, le premier vecteur de valeur caractéristique comprenant des informations relatives au rythme de parole extrait des signaux audio d'entrée ; et obtient un rythme de parole converti. Un synthétiseur vocal (13) utilise le rythme de parole converti et les informations relatives au spectre de conduit vocal extraites des signaux audio d'entrée, et génère des signaux audio de sortie. 精度よく発話リズムを変換する。モデル記憶部（１０）は、第一グループの話者が発話した第一音声信号から抽出した少なくとも音素の発話リズムに関する情報を含む第１特徴量ベクトルを入力とし、第一音声信号の発話リズムを、第二グループの話者の発話リズムに変換して出力するニューラルネットワークである発話リズム変換モデルを記憶する。特徴量抽出部（１１）は、第一グループの話者が発話した入力音声信号から声道スペクトルに関する情報と発話リズムに関する情報とを抽出する。変換部（１２）は、入力音声信号から抽出した発話リズムに関する情報を含む第１特徴量ベクトルを発話リズム変換モデルに入力して変換後の発話リズムを得る。音声合成部（１３）は、変換後の発話リズムと入力音声信号から抽出した声道スペクトルに関する情報とを用いて出力音声信号を生成する。
Bibliography:	Application Number: WO2019JP24438