HVQ-VAE: Variational auto-encoder with hyperbolic vector quantization

Vector quantized-variational autoencoder (VQ-VAE) and its variants have made significant progress in creating discrete latent space via learning a codebook. Previous works on VQ-VAE have focused on discrete latent spaces in Euclidean or in spherical spaces. This paper studies the geometric prior of...

Full description

Saved in:

Bibliographic Details
Published in	Computer vision and image understanding Vol. 258; p. 104392
Main Authors	Chen, Shangyu, Fang, Pengfei, Harandi, Mehrtash, Le, Trung, Cai, Jianfei, Phung, Dinh
Format	Journal Article
Language	English
Published	Elsevier Inc 01.07.2025
Subjects	Hyperbolic neural networks Unsupervised learning VQ-VAE VQ-VAE 41A10 65D05 65D17 Hyperbolic neural networks 41A05 Unsupervised learning
Online Access	Get full text
ISSN	1077-3142
DOI	10.1016/j.cviu.2025.104392

Cover

Loading…

More Information
Summary:	Vector quantized-variational autoencoder (VQ-VAE) and its variants have made significant progress in creating discrete latent space via learning a codebook. Previous works on VQ-VAE have focused on discrete latent spaces in Euclidean or in spherical spaces. This paper studies the geometric prior of hyperbolic spaces as a way to improve the learning capacity of VQ-VAE. That being said, working with the VQ-VAE in the hyperbolic space is not without difficulties, and the benefits of using hyperbolic space as the geometric prior for the latent space have never been studied in VQ-VAE. We bridge this gap by developing the VQ-VAE with hyperbolic vector quantization. To this end, we propose the hyperbolic VQ-VAE (HVQ-VAE), which learns the latent embedding of data and the codebook in the hyperbolic space. Specifically, we endow the discrete latent space in the Poincaré ball, such that the clustering algorithm can be formulated and optimized in the Poincaré ball. Thorough experiments against various baselines are conducted to evaluate the superiority of the proposed HVQ-VAE empirically. We show that HVQ-VAE enjoys better image reconstruction, effective codebook usage, and fast convergence than baselines. We also present evidence that HVQ-VAE outperforms VQ-VAE in low-dimensional latent space. •This paper introduces HVQ-VAE enhancing traditional VQ-VAE with geometric priors.•The encoder of HVQ-VAE can learn the inherent hierarchical structures from the data.•Demonstrates superior image reconstruction and learning efficiency.•HVQ-VAE’s geometric prior leads to higher codebook usage, faster convergence, and improved performance.•HVQ-VAE employs Riemannian optimization to update the codebook within hyperbolic space.
ISSN:	1077-3142
DOI:	10.1016/j.cviu.2025.104392