Design and Development of Visual Question Answering model for the Visually Impaired

There is a need for diagnostic tests to assess our progress and identify issues involved in constructing deep learning systems that can reason and respond to queries regarding visual input. The requirement to gain feature descriptions of the images that are present on the Web presents a difficulty f...

Full description

Saved in:
Bibliographic Details
Published in2023 International Conference on Recent Advances in Science and Engineering Technology (ICRASET) pp. 1 - 7
Main Authors G, Shruthi, Patil, Pradyumna, M, Krishna Raj P
Format Conference Proceeding
LanguageEnglish
Published IEEE 23.11.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:There is a need for diagnostic tests to assess our progress and identify issues involved in constructing deep learning systems that can reason and respond to queries regarding visual input. The requirement to gain feature descriptions of the images that are present on the Web presents a difficulty for those who are visually impaired. Visual Question Answering aims to respond to the questions of the blind. Visual Question Answering is the study of machines which learn from input data in the form of images to extract features and provide answers to the posed questions by the user. Existing standards for answering questions visually can be useful, but they contain strong bias that models can use to answer the given questions accurately without using logic. Additionally, the models combine several inputs of inaccuracy, making it challenging to identify model flaws. In general, the VQA used in this project is an algorithm that takes input as an image along with a human understandable language question and it produces natural language answers as its output. VQA is a multi-disciplinary research problem by nature. The model uses a dataset that assesses many aspects of visual reasoning. It has few biases and extensive annotations that describe the type of reasoning that each question necessitates. After this, analysis of various contemporary visual reasoning systems such as LSTM Q+I, EQ-1, and ALMA which utilize multiple approaches to solving questions but are subject to overtraining and making answer prediction complex. To keep things simple, the model will be utilizing CNN and LSTM which is using VQA-2.0 dataset, offering fresh perspectives on their strengths and weaknesses. With the huge potential of VQA, the readers can see that it is a necessity for further research to take place.
DOI:10.1109/ICRASET59632.2023.10419884