The multi-modal fusion in visual question answering: a review of attention mechanisms
Visual Question Answering (VQA) is a significant cross-disciplinary issue in the fields of computer vision and natural language processing that requires a computer to output a natural language answer based on pictures and questions posed based on the pictures. This requires simultaneous processing o...
Saved in:
Published in | PeerJ. Computer science Vol. 9; p. e1400 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
United States
PeerJ. Ltd
30.05.2023
PeerJ Inc |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Be the first to leave a comment!