Focus-Consistent Multi-Level Aggregation for Compositional Zero-Shot Learning
To transfer knowledge from seen attribute-object compositions to recognize unseen ones, recent compositional zero-shot learning (CZSL) methods mainly discuss the optimal classification branches to identify the elements, leading to the popularity of employing a three-branch architecture. However, the...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
30.08.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | To transfer knowledge from seen attribute-object compositions to recognize
unseen ones, recent compositional zero-shot learning (CZSL) methods mainly
discuss the optimal classification branches to identify the elements, leading
to the popularity of employing a three-branch architecture. However, these
methods mix up the underlying relationship among the branches, in the aspect of
consistency and diversity. Specifically, consistently providing the
highest-level features for all three branches increases the difficulty in
distinguishing classes that are superficially similar. Furthermore, a single
branch may focus on suboptimal regions when spatial messages are not shared
between the personalized branches. Recognizing these issues and endeavoring to
address them, we propose a novel method called Focus-Consistent Multi-Level
Aggregation (FOMA). Our method incorporates a Multi-Level Feature Aggregation
(MFA) module to generate personalized features for each branch based on the
image content. Additionally, a Focus-Consistent Constraint encourages a
consistent focus on the informative regions, thereby implicitly exchanging
spatial information between all branches. Extensive experiments on three
benchmark datasets (UT-Zappos, C-GQA, and Clothing16K) demonstrate that our
FOMA outperforms SOTA. |
---|---|
DOI: | 10.48550/arxiv.2408.17083 |