VAM-Net: Vegetation-Attentive deep network for Multi-modal fusion of visible-light and vegetation-sensitive images

•A radiometric correction mechanism was proposed.•A combined attention mechanism was designed to select useful information and avoid excessive parameters.•We modify ResNet and VGG (Visual Geometry Group) to capture deep features from vegetation-sensitive and visible-light images respectively.•A nove...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of applied earth observation and geoinformation Vol. 127; p. 103642
Main Authors Zang, Yufu, Wang, Shuye, Guan, Haiyan, Peng, Daifeng, Chen, Jike, Chen, Yanming, Delavar, Mahmoud R.
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.03.2024
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•A radiometric correction mechanism was proposed.•A combined attention mechanism was designed to select useful information and avoid excessive parameters.•We modify ResNet and VGG (Visual Geometry Group) to capture deep features from vegetation-sensitive and visible-light images respectively.•A novel loss function was proposed that simultaneously constrains the feature learning and matching processes. Multi-modal fusion of remote sensing images poses challenges because of the intricate imaging mechanisms and variations in radiation across different modalities. Specifically, the fusion of visible-light and vegetation-sensitive images encounters similar difficulties. Traditional methods have seldom considered the varied imaging mechanisms and radiation difference between modalities, resulting in discrepancies in the correspond features. To address the issue, we propose the VAM-Net (Vegetation-Attentive Multi-modal deep Network) combining a radiometric correction mechanism and a lightweight multi-modal adaptive feature selection method for fusing multi-modal images. First, the vegetation index (VDVI) is integrated into visible-light images to mitigate the radiometric differences between visible-light images and vegetation-sensitive images (e.g., infrared and red edge images). Then, a two-branch network incorporating attention mechanisms is designed to independently capture the texture features and select similar features cross two different modalities of images. Last, a new loss function is presented to ensure the learned features are suitable for multi-modal fusion. The VAM-Net is evaluated by visible-light and vegetation-sensitive images in three different areas, and the experimental results show that VAM-Net attains an average precision of 67.02%, and recall of 35.49%, and an average RMSE of 2.191px, demonstrating the accuracy and robustness of VAM-Net in multi-modal image fusion.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1569-8432
1872-826X
DOI:10.1016/j.jag.2023.103642