Objects and scenes classification with selective use of central and peripheral image content
•En-HMAX adopts a relative order of importance depending on the image category.•Central vision corresponds to object recognition.•Peripheral vision corresponds to scene recognition.•Deep learning models exhibit similar trends towards object and scene images.•Only 50% of the relevant image data could...
Saved in:
Published in | Journal of visual communication and image representation Vol. 66; p. 102698 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Inc
01.01.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •En-HMAX adopts a relative order of importance depending on the image category.•Central vision corresponds to object recognition.•Peripheral vision corresponds to scene recognition.•Deep learning models exhibit similar trends towards object and scene images.•Only 50% of the relevant image data could achieve 90% object classification accuracy.
The human visual recognition system is more efficient than any current robotic vision setting. One reason for this superiority is that humans utilize different fields of vision, depending on the recognition task. For instance, experiments on human subjects show that the peripheral vision is more useful than the central vision in recognizing scenes. We tested our recently-developed model, that is, the elastic net-regularized hierarchical MAX (En-HMAX), in recognizing objects and scenes. In various experimental conditions, images were occluded with windows and scotomas of varying sizes. With this model, classification accuracies of up to 90% for objects and scenes were possible. Modelling human experiments, window and scotoma analysis with the En-HMAX model revealed that object and scene recognition are sensitive to the availability of data in the centre and the periphery of the images, respectively. Similarly, results of deep learning models have shown that the classification accuracy diminishes dramatically in the absence of the peripheral vision. These differences led us to further analyse the performance of the En-HMAX model with the parafoveal versus peripheral areas of vision, in a second study. Results of the second study show that approximately 50% of the visual field would be sufficient to achieve 96% accuracy in the classification of unseen images. The En-HMAX model adopts a relative order of importance, similar to the human visual system, depending on the image category. We showed that utilizing the relevant regions of vision can significantly reduce the image processing time and size. |
---|---|
ISSN: | 1047-3203 1095-9076 |
DOI: | 10.1016/j.jvcir.2019.102698 |