A$^\text{T}$A: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting
Image inpainting aims to fill the missing region of an image. Recently, there has been a surge of interest in foreground-conditioned background inpainting, a sub-task that fills the background of an image while the foreground subject and associated text prompt are provided. Existing background inpai...
Saved in:
Main Authors | , , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
02.04.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Image inpainting aims to fill the missing region of an image. Recently, there
has been a surge of interest in foreground-conditioned background inpainting, a
sub-task that fills the background of an image while the foreground subject and
associated text prompt are provided. Existing background inpainting methods
typically strictly preserve the subject's original position from the source
image, resulting in inconsistencies between the subject and the generated
background. To address this challenge, we propose a new task, the "Text-Guided
Subject-Position Variable Background Inpainting", which aims to dynamically
adjust the subject position to achieve a harmonious relationship between the
subject and the inpainted background, and propose the Adaptive Transformation
Agent (A$^\text{T}$A) for this task. Firstly, we design a PosAgent Block that
adaptively predicts an appropriate displacement based on given features to
achieve variable subject-position. Secondly, we design the Reverse Displacement
Transform (RDT) module, which arranges multiple PosAgent blocks in a reverse
structure, to transform hierarchical feature maps from deep to shallow based on
semantic information. Thirdly, we equip A$^\text{T}$A with a Position Switch
Embedding to control whether the subject's position in the generated image is
adaptively predicted or fixed. Extensive comparative experiments validate the
effectiveness of our A$^\text{T}$A approach, which not only demonstrates
superior inpainting capabilities in subject-position variable inpainting, but
also ensures good performance on subject-position fixed inpainting. |
---|---|
DOI: | 10.48550/arxiv.2504.01603 |