Learning A Sparse Transformer Network for Effective Image Deraining

Transformers-based methods have achieved significant performance in image deraining as they can model the non-local information which is vital for high-quality image reconstruction. In this paper, we find that most existing Transformers usually use all similarities of the tokens from the query-key p...

Full description

Saved in:
Bibliographic Details
Published inProceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 5896 - 5905
Main Authors Chen, Xiang, Li, Hao, Li, Mingqiang, Pan, Jinshan
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2023
Subjects
Online AccessGet full text
ISSN1063-6919
DOI10.1109/CVPR52729.2023.00571

Cover

Abstract Transformers-based methods have achieved significant performance in image deraining as they can model the non-local information which is vital for high-quality image reconstruction. In this paper, we find that most existing Transformers usually use all similarities of the tokens from the query-key pairs for the feature aggregation. However, if the tokens from the query are different from those of the key, the self-attention values estimated from these tokens also involve in feature aggregation, which accordingly interferes with the clear image restoration. To overcome this problem, we propose an effective DeRaining network, Sparse Transformer (DRSformer) that can adaptively keep the most useful self-attention values for feature aggregation so that the aggregated features better facilitate high-quality image reconstruction. Specifically, we develop a learnable top-k selection operator to adaptively retain the most crucial attention scores from the keys for each query for better feature aggregation. Simultaneously, as the naive feed-forward network in Transformers does not model the multi-scale information that is important for latent clear image restoration, we develop an effective mixed-scale feed-forward network to generate better features for image deraining. To learn an enriched set of hybrid features, which combines local context from CNN operators, we equip our model with mixture of experts feature compensator to present a cooperation refinement deraining scheme. Extensive experimental results on the commonly used benchmarks demonstrate that the proposed method achieves favorable performance against state-of-the-art approaches. The source code and trained models are available at https://github.com/cschenxiang/DRSformer.
AbstractList Transformers-based methods have achieved significant performance in image deraining as they can model the non-local information which is vital for high-quality image reconstruction. In this paper, we find that most existing Transformers usually use all similarities of the tokens from the query-key pairs for the feature aggregation. However, if the tokens from the query are different from those of the key, the self-attention values estimated from these tokens also involve in feature aggregation, which accordingly interferes with the clear image restoration. To overcome this problem, we propose an effective DeRaining network, Sparse Transformer (DRSformer) that can adaptively keep the most useful self-attention values for feature aggregation so that the aggregated features better facilitate high-quality image reconstruction. Specifically, we develop a learnable top-k selection operator to adaptively retain the most crucial attention scores from the keys for each query for better feature aggregation. Simultaneously, as the naive feed-forward network in Transformers does not model the multi-scale information that is important for latent clear image restoration, we develop an effective mixed-scale feed-forward network to generate better features for image deraining. To learn an enriched set of hybrid features, which combines local context from CNN operators, we equip our model with mixture of experts feature compensator to present a cooperation refinement deraining scheme. Extensive experimental results on the commonly used benchmarks demonstrate that the proposed method achieves favorable performance against state-of-the-art approaches. The source code and trained models are available at https://github.com/cschenxiang/DRSformer.
Author Pan, Jinshan
Li, Mingqiang
Li, Hao
Chen, Xiang
Author_xml – sequence: 1
  givenname: Xiang
  surname: Chen
  fullname: Chen, Xiang
  organization: School of Computer Science and Engineering, Nanjing University of Science and Technology
– sequence: 2
  givenname: Hao
  surname: Li
  fullname: Li, Hao
  organization: School of Computer Science and Engineering, Nanjing University of Science and Technology
– sequence: 3
  givenname: Mingqiang
  surname: Li
  fullname: Li, Mingqiang
  organization: Information Science Academy, China Electronics Technology Group Corporation
– sequence: 4
  givenname: Jinshan
  surname: Pan
  fullname: Pan, Jinshan
  organization: School of Computer Science and Engineering, Nanjing University of Science and Technology
BookMark eNotjNFKwzAUhqMoOGffYBd5gdaTpMnJuRx16qCo6PR2hPZ0VG070qL49m7o1cf_wf9dirN-6FmIhYJMKaDr4u3p2WrUlGnQJgOwqE5EQkjeWDCgNPlTMVPgTOpI0YVIxvEdAIxWypGfiaLkEPu238mlfNmHOLLcxNCPzRA7jvKBp-8hfsjDlKum4Wpqv1iuu7BjecMxtMfrlThvwufIyT_n4vV2tSnu0_Lxbl0sy7TVkE9pTqqxZJ1DZ3xVe3Q1OtQeA5qKgXWovfXeWQ9EITdQc228R5UHq49iLhZ_3ZaZt_vYdiH-bBUc6ojW_ALP4kwD
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR52729.2023.00571
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 9798350301298
EISSN 1063-6919
EndPage 5905
ExternalDocumentID 10204775
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: U22B2049,U19B2040,61922043,61872421,62272230
  funderid: 10.13039/501100001809
– fundername: National Key R&D Program of China
  grantid: 2018AAA0102001
  funderid: 10.13039/501100012166
– fundername: Fundamental Research Funds for the Central Universities
  grantid: 30920041109
  funderid: 10.13039/501100012226
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i204t-491f595667638cd876d767287a73ce0e2ad8588658099a430ded388714a529a43
IEDL.DBID RIE
IngestDate Wed Aug 27 02:56:29 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i204t-491f595667638cd876d767287a73ce0e2ad8588658099a430ded388714a529a43
PageCount 10
ParticipantIDs ieee_primary_10204775
PublicationCentury 2000
PublicationDate 2023-June
PublicationDateYYYYMMDD 2023-06-01
PublicationDate_xml – month: 06
  year: 2023
  text: 2023-June
PublicationDecade 2020
PublicationTitle Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev CVPR
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003211698
Score 2.6593854
Snippet Transformers-based methods have achieved significant performance in image deraining as they can model the non-local information which is vital for high-quality...
SourceID ieee
SourceType Publisher
StartPage 5896
SubjectTerms Adaptation models
Benchmark testing
Computational modeling
Computer vision
Image restoration
Low-level vision
Source coding
Transformers
Title Learning A Sparse Transformer Network for Effective Image Deraining
URI https://ieeexplore.ieee.org/document/10204775
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELagE1N5FPGWB9aEJH6PqFAVJKoKWtStcuwLQoi2KunCr-ecpEUgIbElHhLLj---s--7I-TSMYdWgOPiBZlE3CoTGc9lpI0tDMszodMgcH4YyP6Y30_EpBGrV1oYAKiCzyAOj9Vdvp-7VTgqwx2eJVwpsU22cZ3VYq3NgQpDV0Ya3cjj0sRcdZ-HjyJD9hiHGuFx0F2mP4qoVDak1yaD9d_r0JG3eFXmsfv8lZjx393bJZ1vuR4dbgzRHtmC2T5pN_ySNrv344B0m2SqL_SaPi3QowU6WvNWWNJBHRFO8ZXWSY0RCendOyIOvYGmlESHjHu3o24_aoooRK_YmzLiJi0EOkESgUQ7j-DnlVToJ1nFHCSQWa-F1khEkCtazhIPniHypNyKLDQcktZsPoMjQrkTTjprZRF4XsGtY_gNr4RgXiU5OyadMCjTRZ0nY7oej5M_2k_JTpiYOvDqjLTK5QrO0cSX-UU1tV9orqN4
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED5BGWAqjyLeeGBNSOJXMqJC1UIbVdCibpVjuwgh2qqkC7-ec5IWgYTElniwLTu-7zvnvjuAK001ogDDj9eKwGNKJl5imPDiRE0SmkU8Dp3AuZeK9pDdj_ioEqsXWhhrbRF8Zn33WPzLNzO9dFdleMKjgEnJN2ELgZ_xUq61vlKh6MyIJK4EcmGQXDef-488Qv7ouyrhvlNehj_KqBQo0qpDuhq_DB5585d55uvPX6kZ_z3BXWh8C_ZIfw1Fe7Bhp_tQrxgmqc7vxwE0q3SqL-SGPM3Rp7VksGKudkHSMiac4Csp0xqjLSSdd7Q55NZWxSQaMGzdDZptryqj4L3ibHKPJeGEoxsk0JTE2qD5M1JI9JSUpNoGNlIm5nGMVATZomI0MNZQtD0hUzxyDYdQm86m9ggI01wLrZSYOKY3YUpT7MNIzqmRQUaPoeEWZTwvM2WMV-tx8kf7JWy3B73uuNtJH05hx21SGYZ1BrV8sbTnCPh5dlFs8xcrj6bF
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=Learning+A+Sparse+Transformer+Network+for+Effective+Image+Deraining&rft.au=Chen%2C+Xiang&rft.au=Li%2C+Hao&rft.au=Li%2C+Mingqiang&rft.au=Pan%2C+Jinshan&rft.date=2023-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=5896&rft.epage=5905&rft_id=info:doi/10.1109%2FCVPR52729.2023.00571&rft.externalDocID=10204775