Variational Speech Waveform Compression to Catalyze Semantic Communications

We propose a novel neural waveform compression method to catalyze emerging speech semantic communications. By introducing nonlinear transform and variational modeling, we effectively capture the dependencies within speech frames and estimate the probabilistic distribution of the speech feature more...

Full description

Saved in:

Bibliographic Details
Published in	2023 IEEE Wireless Communications and Networking Conference (WCNC) pp. 1 - 6
Main Authors	Yao, Shengshi, Xiao, Zixuan, Wang, Sixian, Dai, Jincheng, Niu, Kai, Zhang, Ping
Format	Conference Proceeding
Language	English
Published	IEEE 01.03.2023
Subjects	Codecs Nonlinear distortion Probabilistic logic Quantization (signal) Semantics Speech recognition Transforms
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We propose a novel neural waveform compression method to catalyze emerging speech semantic communications. By introducing nonlinear transform and variational modeling, we effectively capture the dependencies within speech frames and estimate the probabilistic distribution of the speech feature more accurately, giving rise to better compression performance. In particular, the speech signals are analyzed and synthesized by a pair of nonlinear transforms, yielding latent features. An entropy model with hyperprior is built to capture the probabilistic distribution of latent features, followed by quantization and entropy coding. The proposed waveform codec can be optimized flexibly towards arbitrary rate, and the other appealing feature is that it can be easily optimized for any differentiable loss function, including perceptual loss used in semantic communications. To further improve the speech quality, we incorporate residual coding to mitigate the degradation arising from quantization distortion at the latent space. Results indicate that achieving the same perceptual quality score, the proposed method saves up to 27% coding rate than widely used adaptive multi-rate wideband (AMR-WB) codec as well as emerging neural waveform coding methods.
ISSN:	1558-2612
DOI:	10.1109/WCNC55385.2023.10118921