End-to-end speech diarization via iterative speaker embedding

A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the mul...

Full description

Saved in:

Bibliographic Details
Main Authors	Grangier, David, Teboul, Oliver, Zeghidour, Neil
Format	Patent
Language	English
Published	30.01.2024
Subjects	ACOUSTICS CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online Access	Get full text

Cover

Loading…

Abstract	A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the multiple speakers, the method includes selecting a respective speaker embedding for the respective speaker by determining a probability that the corresponding temporal embedding includes a presence of voice activity by a single new speaker for which a speaker embedding was not previously selected during a previous iteration and selecting the respective speaker embedding for the respective speaker as the temporal embedding. The method also includes, at each time step, predicting a respective voice activity indicator for each respective speaker of the multiple speakers based on the respective speaker embeddings selected during the plurality of iterations and the temporal embedding.
AbstractList	A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the multiple speakers, the method includes selecting a respective speaker embedding for the respective speaker by determining a probability that the corresponding temporal embedding includes a presence of voice activity by a single new speaker for which a speaker embedding was not previously selected during a previous iteration and selecting the respective speaker embedding for the respective speaker as the temporal embedding. The method also includes, at each time step, predicting a respective voice activity indicator for each respective speaker of the multiple speakers based on the respective speaker embeddings selected during the plurality of iterations and the temporal embedding.
Author	Grangier, David Zeghidour, Neil Teboul, Oliver
Author_xml	– fullname: Grangier, David – fullname: Teboul, Oliver – fullname: Zeghidour, Neil
BookMark	eNrjYmDJy89L5WSwdc1L0S3J103NS1EoLkhNTc5QSMlMLMqsSizJzM9TKMtMVMgsSS0C8spSQQoSs1OLFFJzk1JTUjLz0nkYWNMSc4pTeaE0N4Oim2uIs4duakF-fGpxQWJyal5qSXxosKGhhYW5mZGxk5ExMWoAIkoyFg
ContentType	Patent
DBID	EVB
DatabaseName	esp@cenet
DatabaseTitleList
Database_xml	– sequence: 1 dbid: EVB name: esp@cenet url: http://worldwide.espacenet.com/singleLineSearch?locale=en_EP sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
Discipline	Medicine Chemistry Sciences Physics
ExternalDocumentID	US11887623B2
GroupedDBID	EVB
ID	FETCH-epo_espacenet_US11887623B23
IEDL.DBID	EVB
IngestDate	Fri Jul 19 12:54:57 EDT 2024
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-epo_espacenet_US11887623B23
Notes	Application Number: US202117304514
OpenAccessLink	https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240130&DB=EPODOC&CC=US&NR=11887623B2
ParticipantIDs	epo_espacenet_US11887623B2
PublicationCentury	2000
PublicationDate	20240130
PublicationDateYYYYMMDD	2024-01-30
PublicationDate_xml	– month: 01 year: 2024 text: 20240130 day: 30
PublicationDecade	2020
PublicationYear	2024
RelatedCompanies	Google LLC
RelatedCompanies_xml	– name: Google LLC
Score	3.5161097
Snippet	A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio...
SourceID	epo
SourceType	Open Access Repository
SubjectTerms	ACOUSTICS CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Title	End-to-end speech diarization via iterative speaker embedding
URI	https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240130&DB=EPODOC&locale=&CC=US&NR=11887623B2
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3dS8MwED_G_HzTquj8oIL0rbitWbo9FGFtxxD2gVtlb6NNbmyK7Wir_vteaud80Ye8JCEkF353l-R-F4A7i9lIfgCakbTQZIIzglSHm2GrIXhDCIyYIjgPhrwfsMdZa1aBlw0XpsgT-lkkRyRECcJ7Xujr9fYSyytiK7P7aEVVyUNv6nhGeTpuqtNC3fC6jj8eeSPXcF0nmBjDJ4f8aIV7q0vqeofcaFuhwX_uKlbK-rdJ6R3B7phGi_NjqGCswYG7-XlNg_1B-eCtwV4RoSkyqixRmJ2A48fSzBMTY6lna0Sx1GmX05JRqX-sQv07WTJpMtUhfMVUx7cIpTJUp3Db86du36QZzX-WPw8m28lbZ1CNkxjPQSctJVodlKzJqKBsW3yxEHU7bHMu0Q4voPb3OLX_Gi_hUIlSXTRY9Suo5uk7XpPpzaObQmZf1_WIrw
link.rule.ids	230,309,783,888,25576,76876
linkProvider	European Patent Office
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3dT8IwEL8Q_MA3RQ3i10zM3haBlQ4eFhO2EVS-ImB4I1t7RDQOsk39973OIb7oQ1_apmmv-d1d2_tdAa5NZiH5AWgE0kSDCc4IUk1u-PWq4FUhMGCK4Nzr886E3U_r0xy8rLkwaZ7QzzQ5IiFKEN6TVF-vNpdYbhpbGd8EC6pa3rbHtqtnp-OaOi1UdLdle8OBO3B0x7EnI73_aJMfrXBvtkhdb5GLbSk0eE8txUpZ_TYp7X3YHtJoYXIAOQyLUHDWP68VYbeXPXgXYSeN0BQxVWYojA_B9kJpJEsDQ6nFK0TxrNEuRxmjUvtY-Np3smTSZKqD_4qRhm8BSmWojuCq7Y2djkEzmv0sfzYZbSZvHkM-XIZYAo20lKg3UbIao4KyYfL5XFQsv8G5RMs_gfLf45T_a7yEQmfc6866d_2HU9hTYlWXDmblDPJJ9I7nZIaT4CKV3xdtJIui
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Apatent&rft.title=End-to-end+speech+diarization+via+iterative+speaker+embedding&rft.inventor=Grangier%2C+David&rft.inventor=Teboul%2C+Oliver&rft.inventor=Zeghidour%2C+Neil&rft.date=2024-01-30&rft.externalDBID=B2&rft.externalDocID=US11887623B2