Decoupling Multimodal Transformers for Referring Video Object Segmentation

Referring Video Object Segmentation (RVOS) aims to segment the text-depicted object from video sequences. With excellent capabilities in long-range modelling and information interaction, transformers have been increasingly applied in existing RVOS architectures. To better leverage multimodal data, m...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology Vol. 33; no. 9; p. 1
Main Authors	Gao, Mingqi, Yang, Jinyu, Han, Jungong, Lu, Ke, Zheng, Feng, Montana, Giovanni
Format	Journal Article
Language	English
Published	New York IEEE 01.09.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Alignment Decoupled multimodal transformers Decoupling Referring video object segmentation Segmentation Transformers Vision Vision-language pre-training
Online Access	Get full text

Cover

Loading…

Be the first to leave a comment!