Deep Multi-Module Based Language Priors Mitigation Model for Visual Question Answering

TP391%TP3-05; The original intention of visual question answering(VQA)models is to infer the answer based on the relevant information of the question text in the visual image,but many VQA models often yield answers that are biased by some prior knowledge,especially the language priors.This paper pro...

Full description

Saved in:

Bibliographic Details
Published in	东华大学学报（英文版） Vol. 40; no. 6; pp. 684 - 694
Main Authors	YU Shoujian, JIN Xueqin, WU Guowen, SHI Xiujin, ZHANG Hong
Format	Journal Article
Language	English
Published	31.12.2023
Subjects	multimodal fusion visual question answering(VQA) natural language processing language priors computer vision
Online Access	Get full text

Cover

Loading…

Be the first to leave a comment!