Vman: visual-modified attention network for multimodal paradigms Vman: visual-modified attention network for multimodal paradigms

Due to excellent dependency modeling and powerful parallel computing capabilities, Transformer has become the primary research method in vision-language tasks (VLT). However, for multimodal VLT like VQA and VG, which demand high-dependency modeling and heterogeneous modality comprehension, solving t...

Full description

Saved in:
Bibliographic Details
Published inThe Visual computer Vol. 41; no. 4; pp. 2737 - 2754
Main Authors Song, Xiaoyu, Han, Dezhi, Chen, Chongqing, Shen, Xiang, Wu, Huafeng
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.03.2025
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…