Visual Commonsense Causal Reasoning From a Still Image

Even from a still image, humans exhibit the ability to ratiocinate diverse visual cause-and-effect relationships of events preceding, succeeding, and extending beyond the given image scope. Previous work on commonsense causal reasoning (CCR) aimed at understanding general causal dependencies among c...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 13; pp. 85084 - 85097
Main Authors Wu, Xiaojing, Guo, Rui, Li, Qin, Zhu, Ning
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Even from a still image, humans exhibit the ability to ratiocinate diverse visual cause-and-effect relationships of events preceding, succeeding, and extending beyond the given image scope. Previous work on commonsense causal reasoning (CCR) aimed at understanding general causal dependencies among common events in natural language descriptions. However, in real-world scenarios, CCR is fundamentally a multisensory task and is more susceptible to spurious correlations, given that commonsense causal relationships manifest in various modalities and involve multiple sources of confounders. In this work, to the best of our knowledge, we present the first comprehensive study focusing on visual commonsense causal reasoning (VCCR) within the potential outcomes framework. By drawing parallels between vision-language data and human subjects in the observational study, we tailor a foundational framework, VCC-Reasoner, for detecting implicit visual commonsense causation. It combines inverse propensity score weighting and outcome regression, offering dual robust estimates of the average treatment effect. Empirical evidence underscores the efficacy and superiority of VCC-Reasoner, showcasing its outstanding VCCR capabilities.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2025.3558429