Cross-Modal Information Recovery and Enhancement Using Multiple-Input-Multiple-Output Variational Autoencoder
Motivated by the cross-modal information processing mechanism of human brain, vertebrates, and invertebrates, we propose a multiple-input-multiple-output (MIMO) variational autoencoder (VAE) and subsequently apply it to cross-modal information recovery and enhancement. For a cross-modal system with...
Saved in:
Published in | IEEE internet of things journal Vol. 11; no. 15; pp. 26470 - 26480 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
Piscataway
IEEE
01.08.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Motivated by the cross-modal information processing mechanism of human brain, vertebrates, and invertebrates, we propose a multiple-input-multiple-output (MIMO) variational autoencoder (VAE) and subsequently apply it to cross-modal information recovery and enhancement. For a cross-modal system with two modalities, our MIMO VAE consists of two encoders and two decoders. We use human brain cross-modal information fusion mechanism to integrate different modality signals in the MIMO VAE. To simplify the computational complexity of the MIMO VAE, we propose a linearization of the encoders using a compression matrix. Space and time complexity of the proposed MIMO VAE are analyzed. Theoretical proof shows that MIMO VAE could achieve lossless performance subject to certain conditions. Simulation results show that our linearized encoder VAE (LE-VAE) performs much better than the current VAE with Kullback-Leibler (KL) divergence (KL-VAE), and illustrate that the MIMO VAE can successfully perform visual and audio information recovery and enhancement. Our weighted approach for visual and audio enhancement performs better than the unweighted approach. The MIMO VAE could be applied to multimodal Internet of Things and other systems. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 2327-4662 2327-4662 |
DOI: | 10.1109/JIOT.2024.3396401 |