Deep Multimodal Fusion of Visual and Auditory Features for Robust Material Recognition

This paper presents a deep neural network incorporating visual and auditory data fusion to enhance material recognition performance. Traditional recognition techniques relying on single data modalities face accuracy and robustness limitations, especially in complex real-world environments. To addres...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of computers, communications & control Vol. 19; no. 5
Main Authors Shi, Yifei, Ong, Huei Ruey, Yang, Shuai, Fan, Yuxin
Format Journal Article
LanguageEnglish
Published Oradea Agora University of Oradea 01.10.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper presents a deep neural network incorporating visual and auditory data fusion to enhance material recognition performance. Traditional recognition techniques relying on single data modalities face accuracy and robustness limitations, especially in complex real-world environments. To address these challenges, we develop a multimodal fusion-based model. The proposed approach first extracts features from input images and sounds separately using CNNs and spectral analysis. A concatenation layer then integrates the visual and auditory features. Extensive experiments demonstrate superior material classification over uni-modal methods, with 100% test accuracy across seven material types. The multi-modal fusion model also demonstrates stronger resilience to noise and illumination variations. This research provides a valuable foundation for robust material perception in intelligent systems.
ISSN:1841-9836
1841-9844
DOI:10.15837/ijccc.2024.5.6457