Decoupling Multimodal Transformers for Referring Video Object Segmentation
Referring Video Object Segmentation (RVOS) aims to segment the text-depicted object from video sequences. With excellent capabilities in long-range modelling and information interaction, transformers have been increasingly applied in existing RVOS architectures. To better leverage multimodal data, m...
Saved in:
Published in | IEEE transactions on circuits and systems for video technology Vol. 33; no. 9; p. 1 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.09.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Be the first to leave a comment!