System and Method for Audio Processing using Time-Invariant Speaker Embeddings

A system and method for sound processing for performing multi-talker conversation analysis is provided. The sound processing system includes a deep neural network trained for processing audio segments of an audio mixture of the multi-talker conversation. The deep neural network includes a speaker-in...

Full description

Saved in:
Bibliographic Details
Main Authors Le Roux, Jonathan, Subramanian, Aswin Shanmugam, Böddeker, Christoph, Wichern, Gordon
Format Patent
LanguageEnglish
Published 12.09.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract A system and method for sound processing for performing multi-talker conversation analysis is provided. The sound processing system includes a deep neural network trained for processing audio segments of an audio mixture of the multi-talker conversation. The deep neural network includes a speaker-independent layer that produces a speaker-independent output, and a speaker-biased layer applied once independently to each of the audio segments for each multiple speakers of the audio mixture. The deep neural network also processes a time-invariant embedding by individually assigning each application of the speaker-biased layer to a corresponding speaker by inputting the corresponding time-invariant speaker embedding. The deep neural network thus produces data indicative of time-frequency activity regions of each speaker of the multiple speakers in the audio mixture from a combination of speaker-biased outputs.
AbstractList A system and method for sound processing for performing multi-talker conversation analysis is provided. The sound processing system includes a deep neural network trained for processing audio segments of an audio mixture of the multi-talker conversation. The deep neural network includes a speaker-independent layer that produces a speaker-independent output, and a speaker-biased layer applied once independently to each of the audio segments for each multiple speakers of the audio mixture. The deep neural network also processes a time-invariant embedding by individually assigning each application of the speaker-biased layer to a corresponding speaker by inputting the corresponding time-invariant speaker embedding. The deep neural network thus produces data indicative of time-frequency activity regions of each speaker of the multiple speakers in the audio mixture from a combination of speaker-biased outputs.
Author Le Roux, Jonathan
Subramanian, Aswin Shanmugam
Wichern, Gordon
Böddeker, Christoph
Author_xml – fullname: Le Roux, Jonathan
– fullname: Subramanian, Aswin Shanmugam
– fullname: Böddeker, Christoph
– fullname: Wichern, Gordon
BookMark eNrjYmDJy89L5WTwC64sLknNVUjMS1HwTS3JyE9RSMsvUnAsTcnMVwgoyk9OLS7OzEtXKAWTIZm5qbqeeWWJRZmJeSUKwQWpidmpRQquuUmpKSlABcU8DKxpiTnFqbxQmptB2c01xNlDN7UgPz61uCAxOTUvtSQ-NNjIwMjE2MDEyMDU0dCYOFUA23U4rQ
ContentType Patent
DBID EVB
DatabaseName esp@cenet
DatabaseTitleList
Database_xml – sequence: 1
  dbid: EVB
  name: esp@cenet
  url: http://worldwide.espacenet.com/singleLineSearch?locale=en_EP
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
Chemistry
Sciences
Physics
ExternalDocumentID US2024304205A1
GroupedDBID EVB
ID FETCH-epo_espacenet_US2024304205A13
IEDL.DBID EVB
IngestDate Fri Oct 11 05:28:25 EDT 2024
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-epo_espacenet_US2024304205A13
Notes Application Number: US202318224659
OpenAccessLink https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240912&DB=EPODOC&CC=US&NR=2024304205A1
ParticipantIDs epo_espacenet_US2024304205A1
PublicationCentury 2000
PublicationDate 20240912
PublicationDateYYYYMMDD 2024-09-12
PublicationDate_xml – month: 09
  year: 2024
  text: 20240912
  day: 12
PublicationDecade 2020
PublicationYear 2024
RelatedCompanies Mitsubishi Electric Research Laboratories, Inc
RelatedCompanies_xml – name: Mitsubishi Electric Research Laboratories, Inc
Score 3.5658567
Snippet A system and method for sound processing for performing multi-talker conversation analysis is provided. The sound processing system includes a deep neural...
SourceID epo
SourceType Open Access Repository
SubjectTerms ACOUSTICS
MUSICAL INSTRUMENTS
PHYSICS
SPEECH ANALYSIS OR SYNTHESIS
SPEECH OR AUDIO CODING OR DECODING
SPEECH OR VOICE PROCESSING
SPEECH RECOGNITION
Title System and Method for Audio Processing using Time-Invariant Speaker Embeddings
URI https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240912&DB=EPODOC&locale=&CC=US&NR=2024304205A1
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3fS8MwED7G_PmmVZk6JaD0rditTWsehmxtxxTWDbfJ3kbbpDDUbqyd_vteYqd72ksgOQjJwZfLJXffAdynrBHRxLaNJHJTQ3LnIOYaqZFwmzNqCpMJ-d7RD53exH6Z0mkFPja5MIon9FuRIyKiEsR7oc7r5f8jlq9iK_OHeI5Di6fuuOXrpXeM5ok1mrrfaQXDgT_wdM9rTUZ6-Kpk0nM3aRt9pT28SLsSD8FbR-alLLeNSvcE9oc4X1acQkVkGhx5m9prGhz2yy9vDQ5UjGaS42CJw_wMwl-icRJlnPRVDWiCl0_SXvP5gpSx_2iTyFq1Ms3DeM6-0C1GPZLRUkTvYkWCz1hw9fV0DnfdYOz1DFzi7E8js8loez_WBVSzRSZqQKjlSAJBzoQkcuNmzEyLpTSyHcodt_l4CfVdM13tFl_DsewaqoJCHarFai1u0CYX8a1S5Q9nBI7M
link.rule.ids 230,309,783,888,25576,76876
linkProvider European Patent Office
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3fT8IwEL4Q_IFvihpU1CaavS0OWId9IAa2EVA2iIDhjWxrlxB1EBj673utQ3nipQ-95NJe8vV67d13APcxqwQ0Mk09CuqxLrlzEHOVWI-4yRk1hMGEfO_wfKszNp8ndJKDj00tjOIJ_VbkiIioCPGeqvN68f-I5ajcytVDOMOp-VN71HC0LDpG98QqVc1pNdxB3-nbmm03xkPNf1UyGbkbtImx0h5esusSD-5bS9alLLadSvsY9geoL0lPICeSIhTsTe-1Ihx62Zd3EQ5Ujma0wskMh6tT8H-JxkmQcOKpHtAEL5-kueazOcly_9EnkbUaZZmH3k2-MCxGO5LhQgTvYkncz1Bw9fV0Bndtd2R3dFzi9M8i0_Fwez-1c8gn80SUgNCaJQkEOROSyI0bITNqLKaBaVFu1auPF1Depelyt_gWCp2R15v2uv7LFRxJka66KZQhny7X4hr9cxreKLP-AEX-kb8
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Apatent&rft.title=System+and+Method+for+Audio+Processing+using+Time-Invariant+Speaker+Embeddings&rft.inventor=Le+Roux%2C+Jonathan&rft.inventor=Subramanian%2C+Aswin+Shanmugam&rft.inventor=B%C3%B6ddeker%2C+Christoph&rft.inventor=Wichern%2C+Gordon&rft.date=2024-09-12&rft.externalDBID=A1&rft.externalDocID=US2024304205A1