Improving Visual Reasoning Through Semantic Representation

In visual reasoning, the achievement of deep learning significantly improved the accuracy of results. Image features are primarily used as input to get answers. However, the image features are too redundant to learn accurate characterizations within a limited complexity and time. While in the proces...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 9; pp. 91476 - 91486
Main Authors	Zheng, Wenfeng, Liu, Xiangjun, Ni, Xubin, Yin, Lirong, Yang, Bo
Format	Journal Article
Language	English
Published	Piscataway IEEE 2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Cognition Cognitive tasks Decoding Deep learning Feature extraction Knowledge representation Machine learning Machine translation Object detection Reasoning Semantics Task analysis the semantic net visual reasoning Visualization VQA
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In visual reasoning, the achievement of deep learning significantly improved the accuracy of results. Image features are primarily used as input to get answers. However, the image features are too redundant to learn accurate characterizations within a limited complexity and time. While in the process of human reasoning, abstract description of an image is usually to avoid irrelevant details. Inspired by this, a higher-level representation named semantic representation is introduced. In this paper, a detailed visual reasoning model is proposed. This new model contains an image understanding model based on semantic representation, feature extraction and process model refined with watershed and u-distance method, a feature vector learning model using pyramidal pooling and residual network, and a question understanding model combining problem embedding coding method and machine translation decoding method. The feature vector could better represent the whole image instead of overly focused on specific characteristics. The model using semantic representation as input verifies that more accurate results can be obtained by introducing a high-level semantic representation. The result also shows that it is feasible and effective to introduce high-level and abstract forms of knowledge representation into deep learning tasks. This study lays a theoretical and experimental foundation for introducing different levels of knowledge representation into deep learning in the future.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2021.3074937