Vman: visual-modified attention network for multimodal paradigms Vman: visual-modified attention network for multimodal paradigms
Due to excellent dependency modeling and powerful parallel computing capabilities, Transformer has become the primary research method in vision-language tasks (VLT). However, for multimodal VLT like VQA and VG, which demand high-dependency modeling and heterogeneous modality comprehension, solving t...
Saved in:
Published in | The Visual computer Vol. 41; no. 4; pp. 2737 - 2754 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Berlin/Heidelberg
Springer Berlin Heidelberg
01.03.2025
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Be the first to leave a comment!