Entity Dependency Learning Network With Relation Prediction for Video Visual Relation Detection
Video Visual Relation Detection (VidVRD) is a pivotal task in the field of video analysis. It involves detecting object trajectories in videos, predicting potential dynamic relation between these trajectories, and ultimately representing these relationships in the form of <subject, predicate, obj...
Saved in:
Published in | IEEE transactions on circuits and systems for video technology Vol. 34; no. 12; pp. 12425 - 12436 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.12.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Video Visual Relation Detection (VidVRD) is a pivotal task in the field of video analysis. It involves detecting object trajectories in videos, predicting potential dynamic relation between these trajectories, and ultimately representing these relationships in the form of <subject, predicate, object> triplets. Correct prediction of relation is vital for VidVRD. Existing methods mostly adopt the simple fusion of visual and language features of entity trajectories as the feature representation for relation predicates. However, these methods do not take into account the dependency information between the relation predication and the subject and object within the triplet. To address this issue, we propose the entity dependency learning network(EDLN), which can capture the dependency information between relation predicates and subjects, objects, and subject-object pairs. It adaptively integrates these dependency information into the feature representation of relation predicates. Additionally, to effectively model the features of the relation existing between various object entities pairs, in the context encoding phase for relation predicate features, we introduce a fully convolutional encoding approach as a substitute for the self-attention mechanism in the Transformer. Extensive experiments on two public datasets demonstrate the effectiveness of the proposed EDLN. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1051-8215 1558-2205 |
DOI: | 10.1109/TCSVT.2024.3437437 |