Equivariant Multi-Modality Image Fusion

Multi-modality image fusion is a technique that combines information from different sensors or modalities, en-abling the fused image to retain complementary features from each modality, such as functional highlights and texture details. However, effective training of such fusion models is challengin...

Full description

Saved in:
Bibliographic Details
Published in2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 25912 - 25921
Main Authors Zhao, Zixiang, Bai, Haowen, Zhang, Jiangshe, Zhang, Yulun, Zhang, Kai, Xu, Shuang, Chen, Dongdong, Timofte, Radu, Van Gool, Luc
Format Conference Proceeding
LanguageEnglish
Published IEEE 16.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Multi-modality image fusion is a technique that combines information from different sensors or modalities, en-abling the fused image to retain complementary features from each modality, such as functional highlights and texture details. However, effective training of such fusion models is challenging due to the scarcity of ground truth fusion data. To tackle this issue, we propose the Equivariant Multi-Modality imAge fusion (EMMA) paradigm for end-to-end self-supervised learning. Our approach is rooted in the prior knowledge that natural imaging responses are equiv-ariant to certain transformations. Consequently, we introduce a novel training paradigm that encompasses a fusion module, a pseudo-sensing module, and an equivariant fusion module. These components enable the net training to follow the principles of the natural sensing-imaging process while satisfying the equivariant imaging prior. Extensive experiments confirm that EMMA yields high-quality fusion results for infraredvisible and medical images, concurrently facilitating downstream multi-modal segmentation and detection tasks. The code is available at https://github.com/Zhaozixiang1228/MMIF-EMMA.
ISSN:2575-7075
DOI:10.1109/CVPR52733.2024.02448