Decoupling Multimodal Transformers for Referring Video Object Segmentation

Referring Video Object Segmentation (RVOS) aims to segment the text-depicted object from video sequences. With excellent capabilities in long-range modelling and information interaction, transformers have been increasingly applied in existing RVOS architectures. To better leverage multimodal data, m...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on circuits and systems for video technology Vol. 33; no. 9; p. 1
Main Authors Gao, Mingqi, Yang, Jinyu, Han, Jungong, Lu, Ke, Zheng, Feng, Montana, Giovanni
Format Journal Article
LanguageEnglish
Published New York IEEE 01.09.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…