A weighted-variance variational autoencoder model for speech enhancement

We address speech enhancement based on variational autoencoders, which involves learning a speech prior distribution in the time-frequency (TF) domain. A zero-mean complex-valued Gaussian distribution is usually assumed for the generative model, where the speech information is encoded in the varianc...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Golmakani, Ali, Sadeghi, Mostafa, Alameda-Pineda, Xavier, Serizel, Romain
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 26.10.2023
Subjects	Algorithms Learning Modelling Normal distribution Probability distribution Speech processing Statistical analysis Variance
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We address speech enhancement based on variational autoencoders, which involves learning a speech prior distribution in the time-frequency (TF) domain. A zero-mean complex-valued Gaussian distribution is usually assumed for the generative model, where the speech information is encoded in the variance as a function of a latent variable. In contrast to this commonly used approach, we propose a weighted variance generative model, where the contribution of each spectrogram time-frame in parameter learning is weighted. We impose a Gamma prior distribution on the weights, which would effectively lead to a Student's t-distribution instead of Gaussian for speech generative modeling. We develop efficient training and speech enhancement algorithms based on the proposed generative model. Our experimental results on spectrogram auto-encoding and speech enhancement demonstrate the effectiveness and robustness of the proposed approach compared to the standard unweighted variance model.
ISSN:	2331-8422