System and Method for Audio Processing using Time-Invariant Speaker Embeddings

A system and method for sound processing for performing multi-talker conversation analysis is provided. The sound processing system includes a deep neural network trained for processing audio segments of an audio mixture of the multi-talker conversation. The deep neural network includes a speaker-in...

Full description

Saved in:

Bibliographic Details
Main Authors	Le Roux, Jonathan, Subramanian, Aswin Shanmugam, Böddeker, Christoph, Wichern, Gordon
Format	Patent
Language	English
Published	12.09.2024
Subjects	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online Access	Get full text

Cover

Loading…

Abstract	A system and method for sound processing for performing multi-talker conversation analysis is provided. The sound processing system includes a deep neural network trained for processing audio segments of an audio mixture of the multi-talker conversation. The deep neural network includes a speaker-independent layer that produces a speaker-independent output, and a speaker-biased layer applied once independently to each of the audio segments for each multiple speakers of the audio mixture. The deep neural network also processes a time-invariant embedding by individually assigning each application of the speaker-biased layer to a corresponding speaker by inputting the corresponding time-invariant speaker embedding. The deep neural network thus produces data indicative of time-frequency activity regions of each speaker of the multiple speakers in the audio mixture from a combination of speaker-biased outputs.
AbstractList	A system and method for sound processing for performing multi-talker conversation analysis is provided. The sound processing system includes a deep neural network trained for processing audio segments of an audio mixture of the multi-talker conversation. The deep neural network includes a speaker-independent layer that produces a speaker-independent output, and a speaker-biased layer applied once independently to each of the audio segments for each multiple speakers of the audio mixture. The deep neural network also processes a time-invariant embedding by individually assigning each application of the speaker-biased layer to a corresponding speaker by inputting the corresponding time-invariant speaker embedding. The deep neural network thus produces data indicative of time-frequency activity regions of each speaker of the multiple speakers in the audio mixture from a combination of speaker-biased outputs.
Author	Le Roux, Jonathan Subramanian, Aswin Shanmugam Wichern, Gordon Böddeker, Christoph
Author_xml	– fullname: Le Roux, Jonathan – fullname: Subramanian, Aswin Shanmugam – fullname: Böddeker, Christoph – fullname: Wichern, Gordon
BookMark	eNrjYmDJy89L5WTwC64sLknNVUjMS1HwTS3JyE9RSMsvUnAsTcnMVwgoyk9OLS7OzEtXKAWTIZm5qbqeeWWJRZmJeSUKwQWpidmpRQquuUmpKSlABcU8DKxpiTnFqbxQmptB2c01xNlDN7UgPz61uCAxOTUvtSQ-NNjIwMjE2MDEyMDU0dCYOFUA23U4rQ
ContentType	Patent
DBID	EVB
DatabaseName	esp@cenet
DatabaseTitleList
Database_xml	– sequence: 1 dbid: EVB name: esp@cenet url: http://worldwide.espacenet.com/singleLineSearch?locale=en_EP sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
Discipline	Medicine Chemistry Sciences Physics
ExternalDocumentID	US2024304205A1
GroupedDBID	EVB
ID	FETCH-epo_espacenet_US2024304205A13
IEDL.DBID	EVB
IngestDate	Fri Oct 11 05:28:25 EDT 2024
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-epo_espacenet_US2024304205A13
Notes	Application Number: US202318224659
OpenAccessLink	https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240912&DB=EPODOC&CC=US&NR=2024304205A1
ParticipantIDs	epo_espacenet_US2024304205A1
PublicationCentury	2000
PublicationDate	20240912
PublicationDateYYYYMMDD	2024-09-12
PublicationDate_xml	– month: 09 year: 2024 text: 20240912 day: 12
PublicationDecade	2020
PublicationYear	2024
RelatedCompanies	Mitsubishi Electric Research Laboratories, Inc
RelatedCompanies_xml	– name: Mitsubishi Electric Research Laboratories, Inc
Score	3.5658567
Snippet	A system and method for sound processing for performing multi-talker conversation analysis is provided. The sound processing system includes a deep neural...
SourceID	epo
SourceType	Open Access Repository
SubjectTerms	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Title	System and Method for Audio Processing using Time-Invariant Speaker Embeddings
URI	https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240912&DB=EPODOC&locale=&CC=US&NR=2024304205A1
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3fS8MwED7G_PmmVZk6JaD0rditTWsehmxtxxTWDbfJ3kbbpDDUbqyd_vteYqd72ksgOQjJwZfLJXffAdynrBHRxLaNJHJTQ3LnIOYaqZFwmzNqCpMJ-d7RD53exH6Z0mkFPja5MIon9FuRIyKiEsR7oc7r5f8jlq9iK_OHeI5Di6fuuOXrpXeM5ok1mrrfaQXDgT_wdM9rTUZ6-Kpk0nM3aRt9pT28SLsSD8FbR-alLLeNSvcE9oc4X1acQkVkGhx5m9prGhz2yy9vDQ5UjGaS42CJw_wMwl-icRJlnPRVDWiCl0_SXvP5gpSx_2iTyFq1Ms3DeM6-0C1GPZLRUkTvYkWCz1hw9fV0DnfdYOz1DFzi7E8js8loez_WBVSzRSZqQKjlSAJBzoQkcuNmzEyLpTSyHcodt_l4CfVdM13tFl_DsewaqoJCHarFai1u0CYX8a1S5Q9nBI7M
link.rule.ids	230,309,783,888,25576,76876
linkProvider	European Patent Office
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3fT8IwEL4Q_IFvihpU1CaavS0OWId9IAa2EVA2iIDhjWxrlxB1EBj673utQ3nipQ-95NJe8vV67d13APcxqwQ0Mk09CuqxLrlzEHOVWI-4yRk1hMGEfO_wfKszNp8ndJKDj00tjOIJ_VbkiIioCPGeqvN68f-I5ajcytVDOMOp-VN71HC0LDpG98QqVc1pNdxB3-nbmm03xkPNf1UyGbkbtImx0h5esusSD-5bS9alLLadSvsY9geoL0lPICeSIhTsTe-1Ihx62Zd3EQ5Ujma0wskMh6tT8H-JxkmQcOKpHtAEL5-kueazOcly_9EnkbUaZZmH3k2-MCxGO5LhQgTvYkncz1Bw9fV0Bndtd2R3dFzi9M8i0_Fwez-1c8gn80SUgNCaJQkEOROSyI0bITNqLKaBaVFu1auPF1Depelyt_gWCp2R15v2uv7LFRxJka66KZQhny7X4hr9cxreKLP-AEX-kb8
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Apatent&rft.title=System+and+Method+for+Audio+Processing+using+Time-Invariant+Speaker+Embeddings&rft.inventor=Le+Roux%2C+Jonathan&rft.inventor=Subramanian%2C+Aswin+Shanmugam&rft.inventor=B%C3%B6ddeker%2C+Christoph&rft.inventor=Wichern%2C+Gordon&rft.date=2024-09-12&rft.externalDBID=A1&rft.externalDocID=US2024304205A1