하이브리드 다국어 텍스트 의존형 및 텍스트 독립형 화자 검증

화자 검증 방법(400)은 발화(119)에 대응하는 오디오 데이터(120)를 수신하는 동작, 미리 결정된 핫워드를 특성화하는 오디오 데이터의 제1 부분(121)을 처리하여 텍스트 의존형 평가 벡터(214)를 생성하는 동작, 및 하나 이상의 텍스트 의존형 신뢰도 점수(215)를 생성하는 동작을 포함한다. 텍스트 의존형 신뢰도 점수들 중 하나가 임계값을 충족시키는 경우, 동작들은 발화의 화자를 임계값을 충족시키는 텍스트 의존형 신뢰도 점수와 연관된 각각의 등록된 사용자로서 식별하는 동작, 및 화자 검증을 수행하지 않고 액션의 수행을 개...

Full description

Saved in:

Bibliographic Details
Main Authors	CHOJNACKA ROZA, PELECANOS JASON, LOPEZ MORENO IGNACIO, WANG QUAN
Format	Patent
Language	Korean
Published	13.11.2023
Subjects	ACOUSTICS CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online Access	Get full text

Cover

Loading…

More Information
Summary:	화자 검증 방법(400)은 발화(119)에 대응하는 오디오 데이터(120)를 수신하는 동작, 미리 결정된 핫워드를 특성화하는 오디오 데이터의 제1 부분(121)을 처리하여 텍스트 의존형 평가 벡터(214)를 생성하는 동작, 및 하나 이상의 텍스트 의존형 신뢰도 점수(215)를 생성하는 동작을 포함한다. 텍스트 의존형 신뢰도 점수들 중 하나가 임계값을 충족시키는 경우, 동작들은 발화의 화자를 임계값을 충족시키는 텍스트 의존형 신뢰도 점수와 연관된 각각의 등록된 사용자로서 식별하는 동작, 및 화자 검증을 수행하지 않고 액션의 수행을 개시하는 동작을 포함한다. 텍스트 의존형 신뢰도 점수들 중 어느 것도 임계값을 충족시키지 못할 때, 동작들은 쿼리를 특성화하는 오디오 데이터의 제2 부분(122)을 처리하여 텍스트 독립형 평가 벡터(224)를 생성하는 동작, 하나 이상의 텍스트 독립형 신뢰도 점수(225)를 생성하는 동작, 및 발화의 화자의 아이덴티티가 등록된 사용자들 중 임의의 것을 포함하는지를 결정하는 동작을 포함한다. A speaker verification method includes receiving audio data corresponding to an utterance, processing a first portion of the audio data that characterizes a predetermined hotword to generate a text-dependent evaluation vector, and generating one or more text-dependent confidence scores. When one of the text-dependent confidence scores satisfies a threshold, the operations include identifying a speaker of the utterance as a respective enrolled user associated with the text-dependent confidence score that satisfies the threshold and initiating performance of an action without performing speaker verification. When none of the text-dependent confidence scores satisfy the threshold, the operations include processing a second portion of the audio data that characterizes a query to generate a text-independent evaluation vector, generating one or more text-independent confidence scores, and determining whether the identity of the speaker of the utterance includes any of the enrolled users.
Bibliography:	Application Number: KR20237035935