Ensemble CNN-ViT Using Feature-Level Fusion for Gait Recognition

Individual deep learning models showcase impressive performance; however, the capacity of a single model might fall short in capturing the full spectrum of intricate patterns present in the input data. Thus, relying solely on a single model may hamper the attainment of optimal results and broader ge...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 12; pp. 108573 - 108583
Main Authors	Mogan, Jashila Nair, Lee, Chin Poo, Lim, Kian Ming
Format	Journal Article
Language	English
Published	Piscataway IEEE 2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Artificial neural networks Biological system modeling Computational modeling Convolutional neural networks Datasets Deep learning ensemble Feature extraction Feature recognition feature-fusion fusion gait Gait recognition Hidden Markov models Machine learning Multilayer perceptrons Neural networks Pattern recognition Representations Transformers
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Individual deep learning models showcase impressive performance; however, the capacity of a single model might fall short in capturing the full spectrum of intricate patterns present in the input data. Thus, relying solely on a single model may hamper the attainment of optimal results and broader generalization. In light of this, the paper presents an ensemble method that leverages the strengths of multiple Convolutional Neural Networks (CNNs) and Transformer models to elevate gait recognition performance. Additionally, a novel gait representation named windowed Gait Energy Image (GEI) is introduced, obtained by averaging gait frames irrespective of gait cycles. Firstly, the windowed GEI is input to the Convolutional Neural Networks and Transformer models to learn significant gait features. Each model is followed by a Multilayer Perceptron (MLP) to encode the relationship between the extracted features and corresponding class labels. Subsequently, the extracted gait features from each model are flattened and concatenated into a cohesive feature representation before passing through another MLP for subject classification. The performance of the proposed method was assessed on three datasets: OU-ISIR dataset D, CASIA-B, and OU-LP dataset. Experimental results demonstrated remarkable improvements compared to existing methods across all three datasets. The proposed method achieved accuracy rates of 100% on OU-ISIR D, 99.93% on CASIA-B, and 99.94% on OU-LP, showcasing the superior performance of the Ensemble CNN-ViT model using feature-level fusion compared to state-of-the-art methods.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2024.3439602