Autoregressive Image Generation without Vector Quantization
Conventional wisdom holds that autoregressive models for image generation are typically accompanied by vector-quantized tokens. We observe that while a discrete-valued space can facilitate representing a categorical distribution, it is not a necessity for autoregressive modeling. In this work, we pr...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
17.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Conventional wisdom holds that autoregressive models for image generation are
typically accompanied by vector-quantized tokens. We observe that while a
discrete-valued space can facilitate representing a categorical distribution,
it is not a necessity for autoregressive modeling. In this work, we propose to
model the per-token probability distribution using a diffusion procedure, which
allows us to apply autoregressive models in a continuous-valued space. Rather
than using categorical cross-entropy loss, we define a Diffusion Loss function
to model the per-token probability. This approach eliminates the need for
discrete-valued tokenizers. We evaluate its effectiveness across a wide range
of cases, including standard autoregressive models and generalized masked
autoregressive (MAR) variants. By removing vector quantization, our image
generator achieves strong results while enjoying the speed advantage of
sequence modeling. We hope this work will motivate the use of autoregressive
generation in other continuous-valued domains and applications. Code is
available at: https://github.com/LTH14/mar. |
---|---|
DOI: | 10.48550/arxiv.2406.11838 |