Learning Concept-Based Causal Transition and Symbolic Reasoning for Visual Planning
Visual planning simulates how humans make decisions to achieve desired goals in the form of searching for visual causal transitions between an initial visual state and a final visual goal state. It has become increasingly important in egocentric vision with its advantages in guiding agents to perfor...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
05.10.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Visual planning simulates how humans make decisions to achieve desired goals
in the form of searching for visual causal transitions between an initial
visual state and a final visual goal state. It has become increasingly
important in egocentric vision with its advantages in guiding agents to perform
daily tasks in complex environments. In this paper, we propose an interpretable
and generalizable visual planning framework consisting of i) a novel
Substitution-based Concept Learner (SCL) that abstracts visual inputs into
disentangled concept representations, ii) symbol abstraction and reasoning that
performs task planning via the self-learned symbols, and iii) a Visual Causal
Transition model (ViCT) that grounds visual causal transitions to semantically
similar real-world actions. Given an initial state, we perform goal-conditioned
visual planning with a symbolic reasoning method fueled by the learned
representations and causal transitions to reach the goal state. To verify the
effectiveness of the proposed model, we collect a large-scale visual planning
dataset based on AI2-THOR, dubbed as CCTP. Extensive experiments on this
challenging dataset demonstrate the superior performance of our method in
visual task planning. Empirically, we show that our framework can generalize to
unseen task trajectories, unseen object categories, and real-world data.
Further details of this work are provided at
https://fqyqc.github.io/ConTranPlan/. |
---|---|
DOI: | 10.48550/arxiv.2310.03325 |