Neural Image Compression via Attentional Multi-scale Back Projection and Frequency Decomposition

In recent years, neural image compression emerges as a rapidly developing topic in computer vision, where the state-of-the-art approaches now exhibit superior compression performance than their conventional counterparts. Despite the great progress, current methods still have limitations in preservin...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings / IEEE International Conference on Computer Vision pp. 14657 - 14666
Main Authors	Gao, Ge, You, Pei, Pan, Rong, Han, Shunyuan, Zhang, Yuanyuan, Dai, Yuchao, Lee, Hojae
Format	Conference Proceeding
Language	English
Published	IEEE 01.10.2021
Subjects	Computer vision Estimation Image and video synthesis; Low-level and physics-based vision; Neural generative models; Vision applications and systems Image coding Next generation networking Training Video coding
Online Access	Get full text
ISSN	2380-7504
DOI	10.1109/ICCV48922.2021.01441

Cover

Loading…

Abstract	In recent years, neural image compression emerges as a rapidly developing topic in computer vision, where the state-of-the-art approaches now exhibit superior compression performance than their conventional counterparts. Despite the great progress, current methods still have limitations in preserving fine spatial details for optimal reconstruction, especially at low compression rates. We make three contributions in tackling this issue. First, we develop a novel back projection method with attentional and multi-scale feature fusion for augmented representation power. Our back projection method recalibrates the current estimation by establishing feedback connections between high-level and low-level attributes in an attentional and discriminative manner. Second, we propose to decompose the input image and separately process the distinct frequency components, whose derived latents are recombined using a novel dual attention module, so that details inside regions of interest could be explicitly manipulated. Third, we propose a novel training scheme for reducing the latent rounding residual. Experimental results show that, when measured in PSNR, our model reduces BD-rate by 9.88% and 10.32% over the state-of-the-art method, and 4.12% and 4.32% over the latest coding standard Versatile Video Coding (VVC) on the Kodak and CLIC2020 Professional Validation dataset, respectively. Our approach also produces more visually pleasant images when optimized for MS-SSIM. The significant improvement upon existing methods shows the effectiveness of our method in preserving and remedying spatial information for enhanced compression quality.
AbstractList	In recent years, neural image compression emerges as a rapidly developing topic in computer vision, where the state-of-the-art approaches now exhibit superior compression performance than their conventional counterparts. Despite the great progress, current methods still have limitations in preserving fine spatial details for optimal reconstruction, especially at low compression rates. We make three contributions in tackling this issue. First, we develop a novel back projection method with attentional and multi-scale feature fusion for augmented representation power. Our back projection method recalibrates the current estimation by establishing feedback connections between high-level and low-level attributes in an attentional and discriminative manner. Second, we propose to decompose the input image and separately process the distinct frequency components, whose derived latents are recombined using a novel dual attention module, so that details inside regions of interest could be explicitly manipulated. Third, we propose a novel training scheme for reducing the latent rounding residual. Experimental results show that, when measured in PSNR, our model reduces BD-rate by 9.88% and 10.32% over the state-of-the-art method, and 4.12% and 4.32% over the latest coding standard Versatile Video Coding (VVC) on the Kodak and CLIC2020 Professional Validation dataset, respectively. Our approach also produces more visually pleasant images when optimized for MS-SSIM. The significant improvement upon existing methods shows the effectiveness of our method in preserving and remedying spatial information for enhanced compression quality.
Author	Dai, Yuchao Han, Shunyuan Zhang, Yuanyuan Gao, Ge Pan, Rong Lee, Hojae You, Pei
Author_xml	– sequence: 1 givenname: Ge surname: Gao fullname: Gao, Ge email: ge1.gao@samsung.com organization: Samsung R&D Institute China,Xi'an,China – sequence: 2 givenname: Pei surname: You fullname: You, Pei email: pei.you@samsung.com organization: Samsung R&D Institute China,Xi'an,China – sequence: 3 givenname: Rong surname: Pan fullname: Pan, Rong email: rong.pan@samsung.com organization: Samsung R&D Institute China,Xi'an,China – sequence: 4 givenname: Shunyuan surname: Han fullname: Han, Shunyuan email: shuny.han@samsung.com organization: Samsung R&D Institute China,Xi'an,China – sequence: 5 givenname: Yuanyuan surname: Zhang fullname: Zhang, Yuanyuan email: yuan2.zhang@samsung.com organization: Samsung R&D Institute China,Xi'an,China – sequence: 6 givenname: Yuchao surname: Dai fullname: Dai, Yuchao email: daiyuchao@nwpu.edu.cn organization: Northwestern Polytechnical University,Xi'an,China – sequence: 7 givenname: Hojae surname: Lee fullname: Lee, Hojae email: hojae72.lee@samsung.com organization: Samsung R&D Institute China,Xi'an,China
BookMark	eNotjMtOAjEYRqvRRECeQBd9gcG_12mXOIKS4GWhbrG0_5jiMIPTwYS3d4iuvnw5J2dIzuqmRkKuGUwYA3uzKIp3aSznEw6cTYBJyU7I2OaGaa0kN4yrUzLgwkCWK5AXZJjSBkBYbvSAfDzhvnUVXWzdJ9Ki2e5aTCk2Nf2Jjk67Duuuf73xuK-6mCXvKqS3zn_Rl7bZoD9S6upA5y1-77H2B3qHvu80KR7ZJTkvXZVw_L8j8jafvRYP2fL5flFMl1nkILosaAcavVLGCbc2EgLmTFjpjeZrFrDEEAIyYbC0joM1wrMSOWhZeiXVWozI1V83IuJq18ataw8rm_eqzcUvf7xYoA
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/ICCV48922.2021.01441
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences
EISBN	9781665428125 1665428120
EISSN	2380-7504
EndPage	14666
ExternalDocumentID	9709897
Genre	orig-research
GrantInformation_xml	– fundername: National Natural Science Foundation of China funderid: 10.13039/501100001809 – fundername: National Key Research and Development Program of China funderid: 10.13039/501100012166
GroupedDBID	29O 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IPLJI M43 OCL RIE RIL RIO RNS
ID	FETCH-LOGICAL-i203t-d6a06ec558a3ab840de71394c862b1defeddde138ef9a20983c1fe2064fc545b3
IEDL.DBID	RIE
IngestDate	Wed Aug 27 02:24:27 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i203t-d6a06ec558a3ab840de71394c862b1defeddde138ef9a20983c1fe2064fc545b3
PageCount	10
ParticipantIDs	ieee_primary_9709897
PublicationCentury	2000
PublicationDate	2021-Oct.
PublicationDateYYYYMMDD	2021-10-01
PublicationDate_xml	– month: 10 year: 2021 text: 2021-Oct.
PublicationDecade	2020
PublicationTitle	Proceedings / IEEE International Conference on Computer Vision
PublicationTitleAbbrev	ICCV
PublicationYear	2021
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0039286
Score	2.4651344
Snippet	In recent years, neural image compression emerges as a rapidly developing topic in computer vision, where the state-of-the-art approaches now exhibit superior...
SourceID	ieee
SourceType	Publisher
StartPage	14657
SubjectTerms	Computer vision Estimation Image and video synthesis; Low-level and physics-based vision; Neural generative models; Vision applications and systems Image coding Next generation networking Training Video coding
Title	Neural Image Compression via Attentional Multi-scale Back Projection and Frequency Decomposition
URI	https://ieeexplore.ieee.org/document/9709897
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELXaTkzlo4hveWDEaePYSTxCoWqRihgo6lYc5yyVQoraFAl-PeckLQIxsEUeHMen3LuXvHcm5JxHwDUIYJiAORMgBNNSSiZslETKSmGlMwoP78L-SNyO5bhGLjZeGAAoxGfgucviX346Nyv3qaytoo6KVVQndSRupVdrnXUR5uOwssb5HdUedLuPIlbcea247xW84ccBKgV-9JpkuL5zKRuZeas88cznr6aM_13aNml9O_Xo_QaDdkgNsl3SrEpLWr24yz3y5Jpw6Bc6eMX8QV0SKPWvGX2fanqZ52vlOS0cuWyJoQN6pc3MTf9c6LUyqrOU9hal-PqDXoPTo1eirxYZ9W4eun1WHa7AprwT5CwNdScEI2WsA50gzUsB-aoSBilO4qdgIcXM5wcxWKU5PlxgfAscKxhrsOpKgn3SyOYZHBAqLCjfhArnUUIZobRNsG7gGtFfc5kekj23YZO3sn_GpNqro7-Hj8mWC1kpmDshjXyxglME_jw5KyL-BYD4sJE
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4QD3pCBeNve_BoB-vabT0qSkCBeADDDbvuNUF0GBgm-tfbbgOj8eBt2aHb3su-773t-14RuqABUAkMiAFgShgwRiTnnDAdRIHQnGlujcK9vt8esrsRH5XQ5doLAwCZ-Awce5j9y49namk_ldVF0BChCDbQpuF9JnK31gp3DdGHfmGOcxui3mk2H1koqHVbUdfJOocfW6hkDNKqoN7q2rlwZOos08hRn7_GMv735nZQ7durhx_WLLSLSpDsoUpRXOLi1V1U0ZMdwyFfcOfVIAi2MJArYBP8PpH4Kk1X2nOceXLJwiQP8LVUU7v8c6bYSrBMYtya5_LrD3wDVpFeyL5qaNi6HTTbpNhegUxow0tJ7MuGD4rzUHoyMo1eDKZjFUyZJidyY9AQG-xzvRC0kNQ8nKdcDdTUMFqZ-EfePionswQOEGYahKt8YdYRTCgmpI5M5UCl4X9JeXyIqjZg47d8gsa4iNXR36fP0VZ70OuOu53-_THatunL5XMnqJzOl3BqyoA0Osuy_wXx67Ph
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+IEEE+International+Conference+on+Computer+Vision&rft.atitle=Neural+Image+Compression+via+Attentional+Multi-scale+Back+Projection+and+Frequency+Decomposition&rft.au=Gao%2C+Ge&rft.au=You%2C+Pei&rft.au=Pan%2C+Rong&rft.au=Han%2C+Shunyuan&rft.date=2021-10-01&rft.pub=IEEE&rft.eissn=2380-7504&rft.spage=14657&rft.epage=14666&rft_id=info:doi/10.1109%2FICCV48922.2021.01441&rft.externalDocID=9709897