Cross-Modal Information Recovery and Enhancement Using Multiple-Input-Multiple-Output Variational Autoencoder

Motivated by the cross-modal information processing mechanism of human brain, vertebrates, and invertebrates, we propose a multiple-input-multiple-output (MIMO) variational autoencoder (VAE) and subsequently apply it to cross-modal information recovery and enhancement. For a cross-modal system with...

Full description

Saved in:
Bibliographic Details
Published inIEEE internet of things journal Vol. 11; no. 15; pp. 26470 - 26480
Main Author Liang, Jessica E.
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.08.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Motivated by the cross-modal information processing mechanism of human brain, vertebrates, and invertebrates, we propose a multiple-input-multiple-output (MIMO) variational autoencoder (VAE) and subsequently apply it to cross-modal information recovery and enhancement. For a cross-modal system with two modalities, our MIMO VAE consists of two encoders and two decoders. We use human brain cross-modal information fusion mechanism to integrate different modality signals in the MIMO VAE. To simplify the computational complexity of the MIMO VAE, we propose a linearization of the encoders using a compression matrix. Space and time complexity of the proposed MIMO VAE are analyzed. Theoretical proof shows that MIMO VAE could achieve lossless performance subject to certain conditions. Simulation results show that our linearized encoder VAE (LE-VAE) performs much better than the current VAE with Kullback-Leibler (KL) divergence (KL-VAE), and illustrate that the MIMO VAE can successfully perform visual and audio information recovery and enhancement. Our weighted approach for visual and audio enhancement performs better than the unweighted approach. The MIMO VAE could be applied to multimodal Internet of Things and other systems.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2327-4662
2327-4662
DOI:10.1109/JIOT.2024.3396401