A multi-view-CNN framework for deep representation learning in image classification

Deep representation learning in image classification is an area in computer vision where deep Convolutional Neural Networks (CNNs) have flourished. Nevertheless, developing an efficient image recognition model for real world applications is a challenging task, since image datasets are characterized...

Full description

Saved in:

Bibliographic Details
Published in	Computer vision and image understanding Vol. 232; p. 103687
Main Authors	Pintelas, Emmanuel, Livieris, Ioannis E., Kotsiantis, Sotiris, Pintelas, Panagiotis
Format	Journal Article
Language	English
Published	Elsevier Inc 01.07.2023
Subjects	Convolutional neural networks Deep learning Dimensionality reduction Feature augmentation Image classification Transfer learning Deep learning Dimensionality reduction Convolutional neural networks Feature augmentation Transfer learning Image classification
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Deep representation learning in image classification is an area in computer vision where deep Convolutional Neural Networks (CNNs) have flourished. Nevertheless, developing an efficient image recognition model for real world applications is a challenging task, since image datasets are characterized by instances with a large amount of noise and redundant information. Thus, it is essential to incorporate an intelligent feature extraction and filtering method in order to create robust and efficient image representations. In this work, we propose a Multi-View-CNN framework which drastically boosts the performance of pre-trained CNN models, such as ResNet and VGG in image classification applications. In this approach different type of views of the same initial image are used in order to extract different types of features utilizing pre-trained CNN models. However, in order to reduce the huge dimensional size of the raw CNN’s output features and create a robust image representation, the Principal Component Analysis (PCA) dimension reduction method is applied. Then, all these extracted feature vectors are concatenated building a final composite feature representation of the initial image dataset. Finally, this augmented feature vector is used for training a linear model (Logistic Regression) in order to perform the final classification tasks. The main findings of this work are summarized as follows. First, the proposed Multi-View-CNN framework managed to drastically increase the performance results of pre-trained CNN models. Second, the incorporation of PCA as a final layer into the main CNN topology, instead of using the classical dimension reduction layer components such as Averaging and Max Pooling operations, managed to significantly improve the performance. The whole implementation code of this framework alongside with the datasets used in our experimental simulations was uploaded to our public GitHub repository to the following link: https://github.com/EmmanuelPintelas/A-Multi-View-CNN-Framework-for-Deep-Representation-Learning-in-Image-Classification. •We propose a novel Multi-View-CNN framework which drastically boosts the performance of pre-trained CNN models, such as ResNet in image classification applications.•The Multi-View component augments the initial image representation via different types of image views, in order to add useful information by revealing hidden patterns.•For every image view, a CNN extracts features which are compressed via a PCA component and then concatenated feeding a linear model.•We replace the commonly used Averaging Pooling operation via a PCA component in order to create a more efficient compression of the output feature maps improving the final performance.•We replace the Fully Connected neural network of the CNN’s final output layer with a simple linear model, providing interpretation of the most significant views.
ISSN:	1077-3142 1090-235X
DOI:	10.1016/j.cviu.2023.103687