System and Method for Audio Processing using Time-Invariant Speaker Embeddings
A system and method for sound processing for performing multi-talker conversation analysis is provided. The sound processing system includes a deep neural network trained for processing audio segments of an audio mixture of the multi-talker conversation. The deep neural network includes a speaker-in...
Saved in:
Main Authors | , , , |
---|---|
Format | Patent |
Language | English |
Published |
12.09.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | A system and method for sound processing for performing multi-talker conversation analysis is provided. The sound processing system includes a deep neural network trained for processing audio segments of an audio mixture of the multi-talker conversation. The deep neural network includes a speaker-independent layer that produces a speaker-independent output, and a speaker-biased layer applied once independently to each of the audio segments for each multiple speakers of the audio mixture. The deep neural network also processes a time-invariant embedding by individually assigning each application of the speaker-biased layer to a corresponding speaker by inputting the corresponding time-invariant speaker embedding. The deep neural network thus produces data indicative of time-frequency activity regions of each speaker of the multiple speakers in the audio mixture from a combination of speaker-biased outputs. |
---|---|
AbstractList | A system and method for sound processing for performing multi-talker conversation analysis is provided. The sound processing system includes a deep neural network trained for processing audio segments of an audio mixture of the multi-talker conversation. The deep neural network includes a speaker-independent layer that produces a speaker-independent output, and a speaker-biased layer applied once independently to each of the audio segments for each multiple speakers of the audio mixture. The deep neural network also processes a time-invariant embedding by individually assigning each application of the speaker-biased layer to a corresponding speaker by inputting the corresponding time-invariant speaker embedding. The deep neural network thus produces data indicative of time-frequency activity regions of each speaker of the multiple speakers in the audio mixture from a combination of speaker-biased outputs. |
Author | Le Roux, Jonathan Subramanian, Aswin Shanmugam Wichern, Gordon Böddeker, Christoph |
Author_xml | – fullname: Le Roux, Jonathan – fullname: Subramanian, Aswin Shanmugam – fullname: Böddeker, Christoph – fullname: Wichern, Gordon |
BookMark | eNrjYmDJy89L5WTwC64sLknNVUjMS1HwTS3JyE9RSMsvUnAsTcnMVwgoyk9OLS7OzEtXKAWTIZm5qbqeeWWJRZmJeSUKwQWpidmpRQquuUmpKSlABcU8DKxpiTnFqbxQmptB2c01xNlDN7UgPz61uCAxOTUvtSQ-NNjIwMjE2MDEyMDU0dCYOFUA23U4rQ |
ContentType | Patent |
DBID | EVB |
DatabaseName | esp@cenet |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: EVB name: esp@cenet url: http://worldwide.espacenet.com/singleLineSearch?locale=en_EP sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Medicine Chemistry Sciences Physics |
ExternalDocumentID | US2024304205A1 |
GroupedDBID | EVB |
ID | FETCH-epo_espacenet_US2024304205A13 |
IEDL.DBID | EVB |
IngestDate | Fri Oct 11 05:28:25 EDT 2024 |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-epo_espacenet_US2024304205A13 |
Notes | Application Number: US202318224659 |
OpenAccessLink | https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240912&DB=EPODOC&CC=US&NR=2024304205A1 |
ParticipantIDs | epo_espacenet_US2024304205A1 |
PublicationCentury | 2000 |
PublicationDate | 20240912 |
PublicationDateYYYYMMDD | 2024-09-12 |
PublicationDate_xml | – month: 09 year: 2024 text: 20240912 day: 12 |
PublicationDecade | 2020 |
PublicationYear | 2024 |
RelatedCompanies | Mitsubishi Electric Research Laboratories, Inc |
RelatedCompanies_xml | – name: Mitsubishi Electric Research Laboratories, Inc |
Score | 3.5658567 |
Snippet | A system and method for sound processing for performing multi-talker conversation analysis is provided. The sound processing system includes a deep neural... |
SourceID | epo |
SourceType | Open Access Repository |
SubjectTerms | ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION |
Title | System and Method for Audio Processing using Time-Invariant Speaker Embeddings |
URI | https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240912&DB=EPODOC&locale=&CC=US&NR=2024304205A1 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3fS8MwED7G_PmmVZk6JaD0rditTWsehmxtxxTWDbfJ3kbbpDDUbqyd_vteYqd72ksgOQjJwZfLJXffAdynrBHRxLaNJHJTQ3LnIOYaqZFwmzNqCpMJ-d7RD53exH6Z0mkFPja5MIon9FuRIyKiEsR7oc7r5f8jlq9iK_OHeI5Di6fuuOXrpXeM5ok1mrrfaQXDgT_wdM9rTUZ6-Kpk0nM3aRt9pT28SLsSD8FbR-alLLeNSvcE9oc4X1acQkVkGhx5m9prGhz2yy9vDQ5UjGaS42CJw_wMwl-icRJlnPRVDWiCl0_SXvP5gpSx_2iTyFq1Ms3DeM6-0C1GPZLRUkTvYkWCz1hw9fV0DnfdYOz1DFzi7E8js8loez_WBVSzRSZqQKjlSAJBzoQkcuNmzEyLpTSyHcodt_l4CfVdM13tFl_DsewaqoJCHarFai1u0CYX8a1S5Q9nBI7M |
link.rule.ids | 230,309,783,888,25576,76876 |
linkProvider | European Patent Office |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3fT8IwEL4Q_IFvihpU1CaavS0OWId9IAa2EVA2iIDhjWxrlxB1EBj673utQ3nipQ-95NJe8vV67d13APcxqwQ0Mk09CuqxLrlzEHOVWI-4yRk1hMGEfO_wfKszNp8ndJKDj00tjOIJ_VbkiIioCPGeqvN68f-I5ajcytVDOMOp-VN71HC0LDpG98QqVc1pNdxB3-nbmm03xkPNf1UyGbkbtImx0h5esusSD-5bS9alLLadSvsY9geoL0lPICeSIhTsTe-1Ihx62Zd3EQ5Ujma0wskMh6tT8H-JxkmQcOKpHtAEL5-kueazOcly_9EnkbUaZZmH3k2-MCxGO5LhQgTvYkncz1Bw9fV0Bndtd2R3dFzi9M8i0_Fwez-1c8gn80SUgNCaJQkEOROSyI0bITNqLKaBaVFu1auPF1Depelyt_gWCp2R15v2uv7LFRxJka66KZQhny7X4hr9cxreKLP-AEX-kb8 |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Apatent&rft.title=System+and+Method+for+Audio+Processing+using+Time-Invariant+Speaker+Embeddings&rft.inventor=Le+Roux%2C+Jonathan&rft.inventor=Subramanian%2C+Aswin+Shanmugam&rft.inventor=B%C3%B6ddeker%2C+Christoph&rft.inventor=Wichern%2C+Gordon&rft.date=2024-09-12&rft.externalDBID=A1&rft.externalDocID=US2024304205A1 |