End-to-end speech diarization via iterative speaker embedding

A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the mul...

Full description

Saved in:
Bibliographic Details
Main Authors Grangier, David, Teboul, Oliver, Zeghidour, Neil
Format Patent
LanguageEnglish
Published 30.01.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the multiple speakers, the method includes selecting a respective speaker embedding for the respective speaker by determining a probability that the corresponding temporal embedding includes a presence of voice activity by a single new speaker for which a speaker embedding was not previously selected during a previous iteration and selecting the respective speaker embedding for the respective speaker as the temporal embedding. The method also includes, at each time step, predicting a respective voice activity indicator for each respective speaker of the multiple speakers based on the respective speaker embeddings selected during the plurality of iterations and the temporal embedding.
AbstractList A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the multiple speakers, the method includes selecting a respective speaker embedding for the respective speaker by determining a probability that the corresponding temporal embedding includes a presence of voice activity by a single new speaker for which a speaker embedding was not previously selected during a previous iteration and selecting the respective speaker embedding for the respective speaker as the temporal embedding. The method also includes, at each time step, predicting a respective voice activity indicator for each respective speaker of the multiple speakers based on the respective speaker embeddings selected during the plurality of iterations and the temporal embedding.
Author Grangier, David
Zeghidour, Neil
Teboul, Oliver
Author_xml – fullname: Grangier, David
– fullname: Teboul, Oliver
– fullname: Zeghidour, Neil
BookMark eNrjYmDJy89L5WSwdc1L0S3J103NS1EoLkhNTc5QSMlMLMqsSizJzM9TKMtMVMgsSS0C8spSQQoSs1OLFFJzk1JTUjLz0nkYWNMSc4pTeaE0N4Oim2uIs4duakF-fGpxQWJyal5qSXxosKGhhYW5mZGxk5ExMWoAIkoyFg
ContentType Patent
DBID EVB
DatabaseName esp@cenet
DatabaseTitleList
Database_xml – sequence: 1
  dbid: EVB
  name: esp@cenet
  url: http://worldwide.espacenet.com/singleLineSearch?locale=en_EP
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
Chemistry
Sciences
Physics
ExternalDocumentID US11887623B2
GroupedDBID EVB
ID FETCH-epo_espacenet_US11887623B23
IEDL.DBID EVB
IngestDate Fri Jul 19 12:54:57 EDT 2024
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-epo_espacenet_US11887623B23
Notes Application Number: US202117304514
OpenAccessLink https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240130&DB=EPODOC&CC=US&NR=11887623B2
ParticipantIDs epo_espacenet_US11887623B2
PublicationCentury 2000
PublicationDate 20240130
PublicationDateYYYYMMDD 2024-01-30
PublicationDate_xml – month: 01
  year: 2024
  text: 20240130
  day: 30
PublicationDecade 2020
PublicationYear 2024
RelatedCompanies Google LLC
RelatedCompanies_xml – name: Google LLC
Score 3.5161097
Snippet A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio...
SourceID epo
SourceType Open Access Repository
SubjectTerms ACOUSTICS
CALCULATING
COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
COMPUTING
COUNTING
MUSICAL INSTRUMENTS
PHYSICS
SPEECH ANALYSIS OR SYNTHESIS
SPEECH OR AUDIO CODING OR DECODING
SPEECH OR VOICE PROCESSING
SPEECH RECOGNITION
Title End-to-end speech diarization via iterative speaker embedding
URI https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240130&DB=EPODOC&locale=&CC=US&NR=11887623B2
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3dS8MwED_G_HzTquj8oIL0rbitWbo9FGFtxxD2gVtlb6NNbmyK7Wir_vteaud80Ye8JCEkF353l-R-F4A7i9lIfgCakbTQZIIzglSHm2GrIXhDCIyYIjgPhrwfsMdZa1aBlw0XpsgT-lkkRyRECcJ7Xujr9fYSyytiK7P7aEVVyUNv6nhGeTpuqtNC3fC6jj8eeSPXcF0nmBjDJ4f8aIV7q0vqeofcaFuhwX_uKlbK-rdJ6R3B7phGi_NjqGCswYG7-XlNg_1B-eCtwV4RoSkyqixRmJ2A48fSzBMTY6lna0Sx1GmX05JRqX-sQv07WTJpMtUhfMVUx7cIpTJUp3Db86du36QZzX-WPw8m28lbZ1CNkxjPQSctJVodlKzJqKBsW3yxEHU7bHMu0Q4voPb3OLX_Gi_hUIlSXTRY9Suo5uk7XpPpzaObQmZf1_WIrw
link.rule.ids 230,309,783,888,25576,76876
linkProvider European Patent Office
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3dT8IwEL8Q_MA3RQ3i10zM3haBlQ4eFhO2EVS-ImB4I1t7RDQOsk39973OIb7oQ1_apmmv-d1d2_tdAa5NZiH5AWgE0kSDCc4IUk1u-PWq4FUhMGCK4Nzr886E3U_r0xy8rLkwaZ7QzzQ5IiFKEN6TVF-vNpdYbhpbGd8EC6pa3rbHtqtnp-OaOi1UdLdle8OBO3B0x7EnI73_aJMfrXBvtkhdb5GLbSk0eE8txUpZ_TYp7X3YHtJoYXIAOQyLUHDWP68VYbeXPXgXYSeN0BQxVWYojA_B9kJpJEsDQ6nFK0TxrNEuRxmjUvtY-Np3smTSZKqD_4qRhm8BSmWojuCq7Y2djkEzmv0sfzYZbSZvHkM-XIZYAo20lKg3UbIao4KyYfL5XFQsv8G5RMs_gfLf45T_a7yEQmfc6866d_2HU9hTYlWXDmblDPJJ9I7nZIaT4CKV3xdtJIui
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Apatent&rft.title=End-to-end+speech+diarization+via+iterative+speaker+embedding&rft.inventor=Grangier%2C+David&rft.inventor=Teboul%2C+Oliver&rft.inventor=Zeghidour%2C+Neil&rft.date=2024-01-30&rft.externalDBID=B2&rft.externalDocID=US11887623B2