Entity Dependency Learning Network With Relation Prediction for Video Visual Relation Detection

Video Visual Relation Detection (VidVRD) is a pivotal task in the field of video analysis. It involves detecting object trajectories in videos, predicting potential dynamic relation between these trajectories, and ultimately representing these relationships in the form of <subject, predicate, obj...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology Vol. 34; no. 12; pp. 12425 - 12436
Main Authors	Zhang, Guoguang, Tang, Yepeng, Zhang, Chunjie, Zheng, Xiaolong, Zhao, Yao
Format	Journal Article
Language	English
Published	New York IEEE 01.12.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Coding Decoding Encoding entity dependency learning Feature extraction Learning Object detection Predictions Representations Task analysis Trajectory video understanding Visual fields Visual relation detection Visual tasks Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Video Visual Relation Detection (VidVRD) is a pivotal task in the field of video analysis. It involves detecting object trajectories in videos, predicting potential dynamic relation between these trajectories, and ultimately representing these relationships in the form of <subject, predicate, object> triplets. Correct prediction of relation is vital for VidVRD. Existing methods mostly adopt the simple fusion of visual and language features of entity trajectories as the feature representation for relation predicates. However, these methods do not take into account the dependency information between the relation predication and the subject and object within the triplet. To address this issue, we propose the entity dependency learning network(EDLN), which can capture the dependency information between relation predicates and subjects, objects, and subject-object pairs. It adaptively integrates these dependency information into the feature representation of relation predicates. Additionally, to effectively model the features of the relation existing between various object entities pairs, in the context encoding phase for relation predicate features, we introduce a fully convolutional encoding approach as a substitute for the self-attention mechanism in the Transformer. Extensive experiments on two public datasets demonstrate the effectiveness of the proposed EDLN.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2024.3437437