Multi-Modal Image Super-Resolution via Deep Convolutional Transform Learning

Real-world situations often involve processing data from diverse imaging modalities like Multispectral (MS), Near Infrared (NIR), and RGB, each capturing different aspects of the same scene. These modalities often vary in spatial and spectral resolution. Hence, Multi-modal Image Super-Resolution (MI...

Full description

Saved in:
Bibliographic Details
Published in2024 32nd European Signal Processing Conference (EUSIPCO) pp. 671 - 675
Main Authors Kumar, Kriti, Majumdar, Angshul, Kumar, A Anil, Chandra, M Girish
Format Conference Proceeding
LanguageEnglish
Published European Association for Signal Processing - EURASIP 26.08.2024
Subjects
Online AccessGet full text
ISSN2076-1465
DOI10.23919/EUSIPCO63174.2024.10715007

Cover

Loading…
More Information
Summary:Real-world situations often involve processing data from diverse imaging modalities like Multispectral (MS), Near Infrared (NIR), and RGB, each capturing different aspects of the same scene. These modalities often vary in spatial and spectral resolution. Hence, Multi-modal Image Super-Resolution (MISR) techniques are required to improve the spatial/spectral resolution of target modality, taking help from High Resolution (HR) guidance modality that shares common features like textures, edges, and other structures. Traditional MISR approaches using Convolutional Neural Networks (CNNs) typically employ an encoder-decoder architecture, which is prone to overfit in data-limited scenarios. This work proposes a novel deep convolutional analysis sparse coding method, utilizing convolutional transforms within a fusion framework that eliminates the need for a decoder network. Thus, reducing the trainable parameters and enhancing the suitability for data-limited application scenarios. A joint optimization framework is proposed, which learns deep convolutional transforms for Low Resolution (LR) images of the target modality and HR images of the guidance modality, along with a fusion transform that combines these transform features to reconstruct HR images of the target modality. In contrast to dictionary-based synthesis sparse coding methods for MISR, the proposed approach offers improved performance with reduced complexity, leveraging the inherent advantages of transform learning. The efficacy of the proposed method is demonstrated using RGB-NIR and RGB-MS datasets, showing superior reconstruction performance compared to state-of-the-art techniques without introducing additional artifacts from the guidance image.
ISSN:2076-1465
DOI:10.23919/EUSIPCO63174.2024.10715007