Reinforcement Learning from Diffusion Feedback: Q for Image Search
Large vision-language models are steadily gaining personalization capabilities at the cost of fine-tuning or data augmentation. We present two models for image generation using model-agnostic learning that align semantic priors with generative capabilities. RLDF, or Reinforcement Learning from Diffu...
Saved in:
Main Author | |
---|---|
Format | Journal Article |
Language | English |
Published |
27.11.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Large vision-language models are steadily gaining personalization
capabilities at the cost of fine-tuning or data augmentation. We present two
models for image generation using model-agnostic learning that align semantic
priors with generative capabilities. RLDF, or Reinforcement Learning from
Diffusion Feedback, is a singular approach for visual imitation through
prior-preserving reward function guidance. This employs Q-learning (with
standard Q*) for generation and follows a semantic-rewarded trajectory for
image search through finite encoding-tailored actions. The second proposed
method, noisy diffusion gradient, is optimization driven. At the root of both
methods is a special CFG encoding that we propose for continual semantic
guidance. Using only a single input image and no text input, RLDF generates
high-quality images over varied domains including retail, sports and
agriculture showcasing class-consistency and strong visual diversity. Project
website is available at https://infernolia.github.io/RLDF. |
---|---|
DOI: | 10.48550/arxiv.2311.15648 |