Global Information Interactive of ViT for Rain Removal on Single Images

In adverse weather conditions, the quality of outdoor captured images is significantly compromised, primarily due to the presence of rain streaks that severely degrade image clarity and consequently disrupt image recognition and analysis. The spatial variations of rain streaks within individual rain...

Full description

Saved in:
Bibliographic Details
Published in2023 China Automation Congress (CAC) pp. 8411 - 8416
Main Authors Li, Ce, Zhao, Shutian, Huang, Fan, Ma, Pengfei, Ma, Lin, Chen, Huizhong
Format Conference Proceeding
LanguageEnglish
Published IEEE 17.11.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In adverse weather conditions, the quality of outdoor captured images is significantly compromised, primarily due to the presence of rain streaks that severely degrade image clarity and consequently disrupt image recognition and analysis. The spatial variations of rain streaks within individual rainy images pose a considerable challenge for their removal. Despite the notable achievements of Convolutional Neural Network (CNN)-based methods in recent years, their limited receptive fields and lack of adaptability to input content make it difficult for them to cope with real-world scenarios and restore high-quality rain-free images with accurate structural details. To address this issue, we propose an image restoration model based on the Vision Transformer (ViT) architecture. Specifically, our model first employs a U - ViT to capture contextual information, which is then fused with a high-resolution branch that preserves local details. At each stage, attention mechanisms are introduced to reweight local features, leading to the design of a novel information interaction pattern. The model achieves accelerated convergence by performing global information interaction during rain removal, resulting in image restoration that closely resembles the reference images. Furthermore, to mitigate the loss of fine grained details, we introduce a detail fusion module. The resulting tightly interconnected hierarchical structure is referred to as GIVTNet. Experimental results demonstrate that the proposed algorithm yields substantial performance improvements across multiple synthetic and real-world datasets.
ISSN:2688-0938
DOI:10.1109/CAC59555.2023.10450755