System and Method for Audio Processing using Time-Invariant Speaker Embeddings
A system and method for sound processing for performing multi-talker conversation analysis is provided. The sound processing system includes a deep neural network trained for processing audio segments of an audio mixture of the multi-talker conversation. The deep neural network includes a speaker-in...
Saved in:
Main Authors | , , , |
---|---|
Format | Patent |
Language | English |
Published |
12.09.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | A system and method for sound processing for performing multi-talker conversation analysis is provided. The sound processing system includes a deep neural network trained for processing audio segments of an audio mixture of the multi-talker conversation. The deep neural network includes a speaker-independent layer that produces a speaker-independent output, and a speaker-biased layer applied once independently to each of the audio segments for each multiple speakers of the audio mixture. The deep neural network also processes a time-invariant embedding by individually assigning each application of the speaker-biased layer to a corresponding speaker by inputting the corresponding time-invariant speaker embedding. The deep neural network thus produces data indicative of time-frequency activity regions of each speaker of the multiple speakers in the audio mixture from a combination of speaker-biased outputs. |
---|---|
Bibliography: | Application Number: US202318224659 |