Multi-modality MRI fusion with patch complementary pre-training for internet of medical things-based smart healthcare
Magnetic Resonance Imaging (MRI) is a pivotal neuroimaging technique capable of generating images with various contrasts, known as multi-modal images. The integration of these diverse modalities is essential for improving model performance across various tasks. However, in real clinical scenarios, a...
Saved in:
Published in | Information fusion Vol. 107; p. 102342 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.07.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Magnetic Resonance Imaging (MRI) is a pivotal neuroimaging technique capable of generating images with various contrasts, known as multi-modal images. The integration of these diverse modalities is essential for improving model performance across various tasks. However, in real clinical scenarios, acquiring MR images for all modalities is frequently hindered by factors such as patient comfort and scanning costs. Therefore, effectively fusing different modalities to synthesize missing modalities has become a research hot-spot in the field of smart healthcare, particularly in the context of the Internet of Medical Things (IoMT). In this study, we introduce a multi-modal coordinated fusion network (MCF-Net) with Patch Complementarity Pre-training. This network leverages the complementarity and correlation between different modalities to make the fusion of multi-modal MR images, addressing challenges in the IoMT. Specifically, we first employ a Patch Complementarity Mask Autoencoder (PC-MAE) for self-supervised pre-training. The complementarity learning mechanism is introduced to align masks and visual annotations between two modalities. Subsequently, a dual-branch MAE architecture and a shared encoder–decoder are adopted to facilitate cross-modal interactions within mask tokens. Furthermore, during the fine-tuning phase, we incorporate an Attention-Driven Fusion (ADF) module into the MCF-Net. This module synthesizes missing modal images by fusion of multi-modal features from the pre-trained PC-MAE encoder. Additionally, we leverage the pre-trained encoder to extract high-level features from both synthetic and corresponding real images, ensuring consistency throughout the training process. Our experimental findings showcase a notable enhancement in performance across various modalities with our fusion method, outperforming state-of-the-art techniques.
•Propose a multi-modal medical image fusion framework with patch complementary pre-training•Design a novel masking alignment strategy to learn complementary information between modalities.•Introduce an attention-driven fusion module to aggregate multi-modal features. |
---|---|
ISSN: | 1566-2535 1872-6305 |
DOI: | 10.1016/j.inffus.2024.102342 |