End-to-end speech diarization via iterative speaker embedding
A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the mul...
Saved in:
Main Authors | , , |
---|---|
Format | Patent |
Language | English |
Published |
30.01.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the multiple speakers, the method includes selecting a respective speaker embedding for the respective speaker by determining a probability that the corresponding temporal embedding includes a presence of voice activity by a single new speaker for which a speaker embedding was not previously selected during a previous iteration and selecting the respective speaker embedding for the respective speaker as the temporal embedding. The method also includes, at each time step, predicting a respective voice activity indicator for each respective speaker of the multiple speakers based on the respective speaker embeddings selected during the plurality of iterations and the temporal embedding. |
---|---|
AbstractList | A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the multiple speakers, the method includes selecting a respective speaker embedding for the respective speaker by determining a probability that the corresponding temporal embedding includes a presence of voice activity by a single new speaker for which a speaker embedding was not previously selected during a previous iteration and selecting the respective speaker embedding for the respective speaker as the temporal embedding. The method also includes, at each time step, predicting a respective voice activity indicator for each respective speaker of the multiple speakers based on the respective speaker embeddings selected during the plurality of iterations and the temporal embedding. |
Author | Grangier, David Zeghidour, Neil Teboul, Oliver |
Author_xml | – fullname: Grangier, David – fullname: Teboul, Oliver – fullname: Zeghidour, Neil |
BookMark | eNrjYmDJy89L5WSwdc1L0S3J103NS1EoLkhNTc5QSMlMLMqsSizJzM9TKMtMVMgsSS0C8spSQQoSs1OLFFJzk1JTUjLz0nkYWNMSc4pTeaE0N4Oim2uIs4duakF-fGpxQWJyal5qSXxosKGhhYW5mZGxk5ExMWoAIkoyFg |
ContentType | Patent |
DBID | EVB |
DatabaseName | esp@cenet |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: EVB name: esp@cenet url: http://worldwide.espacenet.com/singleLineSearch?locale=en_EP sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Medicine Chemistry Sciences Physics |
ExternalDocumentID | US11887623B2 |
GroupedDBID | EVB |
ID | FETCH-epo_espacenet_US11887623B23 |
IEDL.DBID | EVB |
IngestDate | Fri Jul 19 12:54:57 EDT 2024 |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-epo_espacenet_US11887623B23 |
Notes | Application Number: US202117304514 |
OpenAccessLink | https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240130&DB=EPODOC&CC=US&NR=11887623B2 |
ParticipantIDs | epo_espacenet_US11887623B2 |
PublicationCentury | 2000 |
PublicationDate | 20240130 |
PublicationDateYYYYMMDD | 2024-01-30 |
PublicationDate_xml | – month: 01 year: 2024 text: 20240130 day: 30 |
PublicationDecade | 2020 |
PublicationYear | 2024 |
RelatedCompanies | Google LLC |
RelatedCompanies_xml | – name: Google LLC |
Score | 3.5161097 |
Snippet | A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio... |
SourceID | epo |
SourceType | Open Access Repository |
SubjectTerms | ACOUSTICS CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION |
Title | End-to-end speech diarization via iterative speaker embedding |
URI | https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240130&DB=EPODOC&locale=&CC=US&NR=11887623B2 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3dS8MwED_G_HzTquj8oIL0rbitWbo9FGFtxxD2gVtlb6NNbmyK7Wir_vteaud80Ye8JCEkF353l-R-F4A7i9lIfgCakbTQZIIzglSHm2GrIXhDCIyYIjgPhrwfsMdZa1aBlw0XpsgT-lkkRyRECcJ7Xujr9fYSyytiK7P7aEVVyUNv6nhGeTpuqtNC3fC6jj8eeSPXcF0nmBjDJ4f8aIV7q0vqeofcaFuhwX_uKlbK-rdJ6R3B7phGi_NjqGCswYG7-XlNg_1B-eCtwV4RoSkyqixRmJ2A48fSzBMTY6lna0Sx1GmX05JRqX-sQv07WTJpMtUhfMVUx7cIpTJUp3Db86du36QZzX-WPw8m28lbZ1CNkxjPQSctJVodlKzJqKBsW3yxEHU7bHMu0Q4voPb3OLX_Gi_hUIlSXTRY9Suo5uk7XpPpzaObQmZf1_WIrw |
link.rule.ids | 230,309,783,888,25576,76876 |
linkProvider | European Patent Office |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3dT8IwEL8Q_MA3RQ3i10zM3haBlQ4eFhO2EVS-ImB4I1t7RDQOsk39973OIb7oQ1_apmmv-d1d2_tdAa5NZiH5AWgE0kSDCc4IUk1u-PWq4FUhMGCK4Nzr886E3U_r0xy8rLkwaZ7QzzQ5IiFKEN6TVF-vNpdYbhpbGd8EC6pa3rbHtqtnp-OaOi1UdLdle8OBO3B0x7EnI73_aJMfrXBvtkhdb5GLbSk0eE8txUpZ_TYp7X3YHtJoYXIAOQyLUHDWP68VYbeXPXgXYSeN0BQxVWYojA_B9kJpJEsDQ6nFK0TxrNEuRxmjUvtY-Np3smTSZKqD_4qRhm8BSmWojuCq7Y2djkEzmv0sfzYZbSZvHkM-XIZYAo20lKg3UbIao4KyYfL5XFQsv8G5RMs_gfLf45T_a7yEQmfc6866d_2HU9hTYlWXDmblDPJJ9I7nZIaT4CKV3xdtJIui |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Apatent&rft.title=End-to-end+speech+diarization+via+iterative+speaker+embedding&rft.inventor=Grangier%2C+David&rft.inventor=Teboul%2C+Oliver&rft.inventor=Zeghidour%2C+Neil&rft.date=2024-01-30&rft.externalDBID=B2&rft.externalDocID=US11887623B2 |