VOICE CONVERSION METHOD ELECTRONIC DEVICE AND STORAGE MEDIUM

Disclosed are a voice conversion method, a voice conversion device, an electronic device thereof. The present invention relates to the fields of speech conversion, speech interaction, natural language processing, and deep learning. According to a concrete embodiment of the present invention, the met...

Full description

Saved in:

Bibliographic Details
Main Authors	SUN TAO, WANG WENFU, WANG. XILEI
Format	Patent
Language	English Korean
Published	30.08.2021
Subjects	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Disclosed are a voice conversion method, a voice conversion device, an electronic device thereof. The present invention relates to the fields of speech conversion, speech interaction, natural language processing, and deep learning. According to a concrete embodiment of the present invention, the method comprises the following steps: acquiring a source voice of a first user and a reference voice of a second user; extracting first voice content information and a first acoustic feature from the source voice; extracting a second acoustic feature from the reference voice; acquiring a reconstructed third acoustic feature by inputting the first voice content information, the first acoustic feature, and the second acoustic feature into a pre-trained voice conversion model, wherein the pre-trained voice conversion model is acquired by training; and synthesizing a target voice according to the third acoustic feature. Accordingly, the target voice is synthesized according to a reconstructed third acoustic characteristic acquired by inputting the first voice content information and first acoustic feature of the source voice and the second acoustic feature of the reference voice into the pre-trained voice conversion model, thereby reducing time waiting for voice conversion. 본 출원은 음성 전환 방법, 장치 및 전자 기기를 공개하고, 음성 전환, 음성 상호 작용, 자연 언어 처리 및 딥 러닝 기술 분야에 관한 것이다. 구체적인 구현 방법은, 제1 사용자의 소스 음성과 제2 사용자의 참조 음성을 획득하는 단계; 소스 음성에서 제1 음성 콘텐츠 정보와 제1 음향 특징을 추출하는 단계; 참조 음성에서 제2 음향 특징을 추출하는 단계; 제1 음성 콘텐츠 정보, 제1 음향 특징 및 제2 음향 특징을 미리 훈련된 음성 전환 모델에 입력하여 재구성된 제3 음향 특징을 획득하는 단계 - 미리 훈련된 음성 전환 모델은 제3 사용자의 음성에 따라 훈련하여 획득함 -; 및 제3 음향 특징에 따라 타겟 음성을 합성하는 단계를 포함한다. 당해 방법은 소스 음성의 제1 음성 콘텐츠 정보와 제1 음향 특징, 참조 음성의 제2 음향 특징을 미리 훈련된 음성 전환 모델에 입력하여, 획득한 재구성된 제3 음향 특징에 따라 타겟 음성을 합성하였으며 음성 전환 대기 시간을 단축할 수 있다.
Bibliography:	Application Number: KR20210105264