Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation
This study focuses on a novel task in text-to-image (T2I) generation, namely action customization. The objective of this task is to learn the co-existing action from limited data and generalize it to unseen humans or even animals. Experimental results show that existing subject-driven customization...
Saved in:
Main Authors | , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
27.11.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This study focuses on a novel task in text-to-image (T2I) generation, namely
action customization. The objective of this task is to learn the co-existing
action from limited data and generalize it to unseen humans or even animals.
Experimental results show that existing subject-driven customization methods
fail to learn the representative characteristics of actions and struggle in
decoupling actions from context features, including appearance. To overcome the
preference for low-level features and the entanglement of high-level features,
we propose an inversion-based method Action-Disentangled Identifier (ADI) to
learn action-specific identifiers from the exemplar images. ADI first expands
the semantic conditioning space by introducing layer-wise identifier tokens,
thereby increasing the representational richness while distributing the
inversion across different features. Then, to block the inversion of
action-agnostic features, ADI extracts the gradient invariance from the
constructed sample triples and masks the updates of irrelevant channels. To
comprehensively evaluate the task, we present an ActionBench that includes a
variety of actions, each accompanied by meticulously selected samples. Both
quantitative and qualitative results show that our ADI outperforms existing
baselines in action-customized T2I generation. Our project page is at
https://adi-t2i.github.io/ADI. |
---|---|
DOI: | 10.48550/arxiv.2311.15841 |