FGCVQA: Fine-Grained Cross-Attention for Medical VQA

The application of Visual Question Answering (VQA) in the medical field has significantly impacted traditional medical research methods. A mature medical visual question answering system can greatly help the patients' diagnosis. The Visual Question Answering Model in the generic domain is not c...

Full description

Saved in:

Bibliographic Details
Published in	2023 IEEE International Conference on Image Processing (ICIP) pp. 975 - 979
Main Authors	Wu, Ziheng, Shu, Xinyao, Yan, Shiyang, Lu, Zhenyu
Format	Conference Proceeding
Language	English
Published	IEEE 08.10.2023
Subjects	Feature fusion Image processing Medical signal processing Medical visual question answering Question answering (information retrieval) Radiology Representation learning Semantics Signal processing Source coding Vision and language Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The application of Visual Question Answering (VQA) in the medical field has significantly impacted traditional medical research methods. A mature medical visual question answering system can greatly help the patients' diagnosis. The Visual Question Answering Model in the generic domain is not compelling enough for the feature alignment in medical image and text semantics because of the complex diversity of clinical problems and the difficulties in multi-modal reasoning. To solve these, we propose a model called FGCVQA. It is essential to consider the semantic alignment of the medical images and the language features. Specifically, We use the Cross-Modality Encoder to learn the semantic representation of medical images and texts. It improves the reasoning ability of multi-modal by considering the fine-grained property. The experimental results show that FGCVQA outperforms all previous dataset VQA-RAD methods for radiology images. FGCVQA effectively answers medical visual questions and can help doctors to make better clinical analyses and diagnoses. The source codes can be available at https://github.com/wwzziheng/FGCVQA.
DOI:	10.1109/ICIP49359.2023.10222540