Cross-Modal Information Recovery and Enhancement Using Multiple-Input-Multiple-Output Variational Autoencoder

Motivated by the cross-modal information processing mechanism of human brain, vertebrates, and invertebrates, we propose a multiple-input-multiple-output (MIMO) variational autoencoder (VAE) and subsequently apply it to cross-modal information recovery and enhancement. For a cross-modal system with...

Full description

Saved in:

Bibliographic Details
Published in	IEEE internet of things journal Vol. 11; no. 15; pp. 26470 - 26480
Main Author	Liang, Jessica E.
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.08.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Audio data Autoencoders Brain Coders Complexity Cross-modal information processing Data integration Data processing Decoders Decoding Image reconstruction Information theory Internet of Things Linearization Machine learning MIMO multiple-input–multiple-output (MIMO) Recovery Robot sensing systems Robots variational autoencoder (VAE) Vertebrates Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Motivated by the cross-modal information processing mechanism of human brain, vertebrates, and invertebrates, we propose a multiple-input-multiple-output (MIMO) variational autoencoder (VAE) and subsequently apply it to cross-modal information recovery and enhancement. For a cross-modal system with two modalities, our MIMO VAE consists of two encoders and two decoders. We use human brain cross-modal information fusion mechanism to integrate different modality signals in the MIMO VAE. To simplify the computational complexity of the MIMO VAE, we propose a linearization of the encoders using a compression matrix. Space and time complexity of the proposed MIMO VAE are analyzed. Theoretical proof shows that MIMO VAE could achieve lossless performance subject to certain conditions. Simulation results show that our linearized encoder VAE (LE-VAE) performs much better than the current VAE with Kullback-Leibler (KL) divergence (KL-VAE), and illustrate that the MIMO VAE can successfully perform visual and audio information recovery and enhancement. Our weighted approach for visual and audio enhancement performs better than the unweighted approach. The MIMO VAE could be applied to multimodal Internet of Things and other systems.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2327-4662 2327-4662
DOI:	10.1109/JIOT.2024.3396401