Neural Image Compression via Attentional Multi-scale Back Projection and Frequency Decomposition

In recent years, neural image compression emerges as a rapidly developing topic in computer vision, where the state-of-the-art approaches now exhibit superior compression performance than their conventional counterparts. Despite the great progress, current methods still have limitations in preservin...

Full description

Saved in:
Bibliographic Details
Published inProceedings / IEEE International Conference on Computer Vision pp. 14657 - 14666
Main Authors Gao, Ge, You, Pei, Pan, Rong, Han, Shunyuan, Zhang, Yuanyuan, Dai, Yuchao, Lee, Hojae
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.10.2021
Subjects
Online AccessGet full text
ISSN2380-7504
DOI10.1109/ICCV48922.2021.01441

Cover

Loading…
Abstract In recent years, neural image compression emerges as a rapidly developing topic in computer vision, where the state-of-the-art approaches now exhibit superior compression performance than their conventional counterparts. Despite the great progress, current methods still have limitations in preserving fine spatial details for optimal reconstruction, especially at low compression rates. We make three contributions in tackling this issue. First, we develop a novel back projection method with attentional and multi-scale feature fusion for augmented representation power. Our back projection method recalibrates the current estimation by establishing feedback connections between high-level and low-level attributes in an attentional and discriminative manner. Second, we propose to decompose the input image and separately process the distinct frequency components, whose derived latents are recombined using a novel dual attention module, so that details inside regions of interest could be explicitly manipulated. Third, we propose a novel training scheme for reducing the latent rounding residual. Experimental results show that, when measured in PSNR, our model reduces BD-rate by 9.88% and 10.32% over the state-of-the-art method, and 4.12% and 4.32% over the latest coding standard Versatile Video Coding (VVC) on the Kodak and CLIC2020 Professional Validation dataset, respectively. Our approach also produces more visually pleasant images when optimized for MS-SSIM. The significant improvement upon existing methods shows the effectiveness of our method in preserving and remedying spatial information for enhanced compression quality.
AbstractList In recent years, neural image compression emerges as a rapidly developing topic in computer vision, where the state-of-the-art approaches now exhibit superior compression performance than their conventional counterparts. Despite the great progress, current methods still have limitations in preserving fine spatial details for optimal reconstruction, especially at low compression rates. We make three contributions in tackling this issue. First, we develop a novel back projection method with attentional and multi-scale feature fusion for augmented representation power. Our back projection method recalibrates the current estimation by establishing feedback connections between high-level and low-level attributes in an attentional and discriminative manner. Second, we propose to decompose the input image and separately process the distinct frequency components, whose derived latents are recombined using a novel dual attention module, so that details inside regions of interest could be explicitly manipulated. Third, we propose a novel training scheme for reducing the latent rounding residual. Experimental results show that, when measured in PSNR, our model reduces BD-rate by 9.88% and 10.32% over the state-of-the-art method, and 4.12% and 4.32% over the latest coding standard Versatile Video Coding (VVC) on the Kodak and CLIC2020 Professional Validation dataset, respectively. Our approach also produces more visually pleasant images when optimized for MS-SSIM. The significant improvement upon existing methods shows the effectiveness of our method in preserving and remedying spatial information for enhanced compression quality.
Author Dai, Yuchao
Han, Shunyuan
Zhang, Yuanyuan
Gao, Ge
Pan, Rong
Lee, Hojae
You, Pei
Author_xml – sequence: 1
  givenname: Ge
  surname: Gao
  fullname: Gao, Ge
  email: ge1.gao@samsung.com
  organization: Samsung R&D Institute China,Xi'an,China
– sequence: 2
  givenname: Pei
  surname: You
  fullname: You, Pei
  email: pei.you@samsung.com
  organization: Samsung R&D Institute China,Xi'an,China
– sequence: 3
  givenname: Rong
  surname: Pan
  fullname: Pan, Rong
  email: rong.pan@samsung.com
  organization: Samsung R&D Institute China,Xi'an,China
– sequence: 4
  givenname: Shunyuan
  surname: Han
  fullname: Han, Shunyuan
  email: shuny.han@samsung.com
  organization: Samsung R&D Institute China,Xi'an,China
– sequence: 5
  givenname: Yuanyuan
  surname: Zhang
  fullname: Zhang, Yuanyuan
  email: yuan2.zhang@samsung.com
  organization: Samsung R&D Institute China,Xi'an,China
– sequence: 6
  givenname: Yuchao
  surname: Dai
  fullname: Dai, Yuchao
  email: daiyuchao@nwpu.edu.cn
  organization: Northwestern Polytechnical University,Xi'an,China
– sequence: 7
  givenname: Hojae
  surname: Lee
  fullname: Lee, Hojae
  email: hojae72.lee@samsung.com
  organization: Samsung R&D Institute China,Xi'an,China
BookMark eNotjMtOAjEYRqvRRECeQBd9gcG_12mXOIKS4GWhbrG0_5jiMIPTwYS3d4iuvnw5J2dIzuqmRkKuGUwYA3uzKIp3aSznEw6cTYBJyU7I2OaGaa0kN4yrUzLgwkCWK5AXZJjSBkBYbvSAfDzhvnUVXWzdJ9Ki2e5aTCk2Nf2Jjk67Duuuf73xuK-6mCXvKqS3zn_Rl7bZoD9S6upA5y1-77H2B3qHvu80KR7ZJTkvXZVw_L8j8jafvRYP2fL5flFMl1nkILosaAcavVLGCbc2EgLmTFjpjeZrFrDEEAIyYbC0joM1wrMSOWhZeiXVWozI1V83IuJq18ataw8rm_eqzcUvf7xYoA
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICCV48922.2021.01441
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 9781665428125
1665428120
EISSN 2380-7504
EndPage 14666
ExternalDocumentID 9709897
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  funderid: 10.13039/501100001809
– fundername: National Key Research and Development Program of China
  funderid: 10.13039/501100012166
GroupedDBID 29O
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
ID FETCH-LOGICAL-i203t-d6a06ec558a3ab840de71394c862b1defeddde138ef9a20983c1fe2064fc545b3
IEDL.DBID RIE
IngestDate Wed Aug 27 02:24:27 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-d6a06ec558a3ab840de71394c862b1defeddde138ef9a20983c1fe2064fc545b3
PageCount 10
ParticipantIDs ieee_primary_9709897
PublicationCentury 2000
PublicationDate 2021-Oct.
PublicationDateYYYYMMDD 2021-10-01
PublicationDate_xml – month: 10
  year: 2021
  text: 2021-Oct.
PublicationDecade 2020
PublicationTitle Proceedings / IEEE International Conference on Computer Vision
PublicationTitleAbbrev ICCV
PublicationYear 2021
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0039286
Score 2.4651344
Snippet In recent years, neural image compression emerges as a rapidly developing topic in computer vision, where the state-of-the-art approaches now exhibit superior...
SourceID ieee
SourceType Publisher
StartPage 14657
SubjectTerms Computer vision
Estimation
Image and video synthesis; Low-level and physics-based vision; Neural generative models; Vision applications and systems
Image coding
Next generation networking
Training
Video coding
Title Neural Image Compression via Attentional Multi-scale Back Projection and Frequency Decomposition
URI https://ieeexplore.ieee.org/document/9709897
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELXaTkzlo4hveWDEaePYSTxCoWqRihgo6lYc5yyVQoraFAl-PeckLQIxsEUeHMen3LuXvHcm5JxHwDUIYJiAORMgBNNSSiZslETKSmGlMwoP78L-SNyO5bhGLjZeGAAoxGfgucviX346Nyv3qaytoo6KVVQndSRupVdrnXUR5uOwssb5HdUedLuPIlbcea247xW84ccBKgV-9JpkuL5zKRuZeas88cznr6aM_13aNml9O_Xo_QaDdkgNsl3SrEpLWr24yz3y5Jpw6Bc6eMX8QV0SKPWvGX2fanqZ52vlOS0cuWyJoQN6pc3MTf9c6LUyqrOU9hal-PqDXoPTo1eirxYZ9W4eun1WHa7AprwT5CwNdScEI2WsA50gzUsB-aoSBilO4qdgIcXM5wcxWKU5PlxgfAscKxhrsOpKgn3SyOYZHBAqLCjfhArnUUIZobRNsG7gGtFfc5kekj23YZO3sn_GpNqro7-Hj8mWC1kpmDshjXyxglME_jw5KyL-BYD4sJE
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4QD3pCBeNve_BoB-vabT0qSkCBeADDDbvuNUF0GBgm-tfbbgOj8eBt2aHb3su-773t-14RuqABUAkMiAFgShgwRiTnnDAdRIHQnGlujcK9vt8esrsRH5XQ5doLAwCZ-Awce5j9y49namk_ldVF0BChCDbQpuF9JnK31gp3DdGHfmGOcxui3mk2H1koqHVbUdfJOocfW6hkDNKqoN7q2rlwZOos08hRn7_GMv735nZQ7durhx_WLLSLSpDsoUpRXOLi1V1U0ZMdwyFfcOfVIAi2MJArYBP8PpH4Kk1X2nOceXLJwiQP8LVUU7v8c6bYSrBMYtya5_LrD3wDVpFeyL5qaNi6HTTbpNhegUxow0tJ7MuGD4rzUHoyMo1eDKZjFUyZJidyY9AQG-xzvRC0kNQ8nKdcDdTUMFqZ-EfePionswQOEGYahKt8YdYRTCgmpI5M5UCl4X9JeXyIqjZg47d8gsa4iNXR36fP0VZ70OuOu53-_THatunL5XMnqJzOl3BqyoA0Osuy_wXx67Ph
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+IEEE+International+Conference+on+Computer+Vision&rft.atitle=Neural+Image+Compression+via+Attentional+Multi-scale+Back+Projection+and+Frequency+Decomposition&rft.au=Gao%2C+Ge&rft.au=You%2C+Pei&rft.au=Pan%2C+Rong&rft.au=Han%2C+Shunyuan&rft.date=2021-10-01&rft.pub=IEEE&rft.eissn=2380-7504&rft.spage=14657&rft.epage=14666&rft_id=info:doi/10.1109%2FICCV48922.2021.01441&rft.externalDocID=9709897