[Paper] Deep Learning-based RGBA Image Compression with Masked Window-based Attention

RGBA image that includes an alpha channel for transparency is common in real-world applications. Traditional RGBA compression methods apply the same methods to both RGB and alpha channel, but potentially leading to suboptimal results due to their different characteristics. This paper proposes a deep...

Full description

Saved in:
Bibliographic Details
Published inITE TRANSACTIONS ON MEDIA TECHNOLOGY AND APPLICATIONS Vol. 13; no. 2; pp. 200 - 210
Main Authors Inazu, Yoshiki, Kimata, Hideaki
Format Journal Article
LanguageEnglish
Published The Institute of Image Information and Television Engineers 2025
Subjects
Online AccessGet full text

Cover

Loading…
Abstract RGBA image that includes an alpha channel for transparency is common in real-world applications. Traditional RGBA compression methods apply the same methods to both RGB and alpha channel, but potentially leading to suboptimal results due to their different characteristics. This paper proposes a deep neural network that introduces attention modules individually suitable for RGB signals and alpha channel. The proposed method consists of two networks, one for the RGB signal and one for the alpha channel, with an appropriate attention module applied in each. In particular, a new attention module that focuses on the unmasked regions of the alpha channel is applied. In the evaluation, the proposed method is compared with a simple deep neural network with input and output layers extended from three to four channels and classical RGBA image compression methods.
AbstractList RGBA image that includes an alpha channel for transparency is common in real-world applications. Traditional RGBA compression methods apply the same methods to both RGB and alpha channel, but potentially leading to suboptimal results due to their different characteristics. This paper proposes a deep neural network that introduces attention modules individually suitable for RGB signals and alpha channel. The proposed method consists of two networks, one for the RGB signal and one for the alpha channel, with an appropriate attention module applied in each. In particular, a new attention module that focuses on the unmasked regions of the alpha channel is applied. In the evaluation, the proposed method is compared with a simple deep neural network with input and output layers extended from three to four channels and classical RGBA image compression methods.
Author Inazu, Yoshiki
Kimata, Hideaki
Author_xml – sequence: 1
  fullname: Inazu, Yoshiki
  organization: Graduate School of Engineering, Kogakuin University
– sequence: 2
  fullname: Kimata, Hideaki
  organization: Graduate School of Engineering, Kogakuin University
BookMark eNpN0FFLwzAQB_AgE5xzL36CPAudSdMl7YMPc845mCji8EEkXNvr1rmmJQkMv72RjeHTHX9-HNz_kvRMa5CQa85GgsvstvEw4mIUM3ZG-jFPZaSETHr_9gsydG7LGBNxzGQS98nq8xU6tF_0AbGjSwRrarOOcnBY0rf5_YQuGlgjnbZNZ9G5ujV0X_sNfQb3HchHbcp2f_QT79H4QK7IeQU7h8PjHJDV4-x9-hQtX-aL6WQZFTzjLCrzDFQCXAJLx6lUaZWrgkuslEhRjQuMGeYsiTNVcQ5ciHLMMKQgk4opIcSA3BzuFrZ1zmKlO1s3YH80Z_qvEx060Vzo0EnAdwe8dT68dKJgfV3s8ESP_pQXG7AajfgFsbhrpw
Cites_doi 10.1109/TIP.2021.3058615
10.1109/CVPR52688.2022.01697
10.1109/TCSVT.2012.2221191
10.1109/CVPR52729.2023.01383
10.1109/CVPR42600.2020.00796
10.1145/3655755.3655769
10.1109/ICIP40778.2020.9190935
10.17487/rfc2083
10.1109/ICCV48922.2021.00986
ContentType Journal Article
Copyright 2025 The Institute of Image Information and Television Engineers
Copyright_xml – notice: 2025 The Institute of Image Information and Television Engineers
DBID AAYXX
CITATION
DOI 10.3169/mta.13.200
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
EISSN 2186-7364
EndPage 210
ExternalDocumentID 10_3169_mta_13_200
article_mta_13_2_13_200_article_char_en
GroupedDBID ALMA_UNASSIGNED_HOLDINGS
JSF
JSH
KQ8
OK1
RJT
RZJ
AAYXX
CITATION
ID FETCH-LOGICAL-c1910-db9a74a16a0858678fb7c16ef738e75ce20eb04297f11a133d50ece2a64f07333
ISSN 2186-7364
IngestDate Sun Jul 06 05:09:30 EDT 2025
Thu May 08 13:50:30 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Issue 2
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c1910-db9a74a16a0858678fb7c16ef738e75ce20eb04297f11a133d50ece2a64f07333
OpenAccessLink https://www.jstage.jst.go.jp/article/mta/13/2/13_200/_article/-char/en
PageCount 11
ParticipantIDs crossref_primary_10_3169_mta_13_200
jstage_primary_article_mta_13_2_13_200_article_char_en
PublicationCentury 2000
PublicationDate 2025
2025-00-00
PublicationDateYYYYMMDD 2025-01-01
PublicationDate_xml – year: 2025
  text: 2025
PublicationDecade 2020
PublicationTitle ITE TRANSACTIONS ON MEDIA TECHNOLOGY AND APPLICATIONS
PublicationTitleAlternate MTA
PublicationYear 2025
Publisher The Institute of Image Information and Television Engineers
Publisher_xml – name: The Institute of Image Information and Television Engineers
References 8) Y. Inazu and H. Kimata: "Study on Learned RGBA Image Compression Using Loss Function Based on Alpha Channel," International Conference on Image, Video and Signal Processing (2024
19) J. Li, S. Ma, J. Zhang and D. Tao: "Privacy-Preserving Portrait Matting,"ACM International Conference on Multimedia (2021
11) D. Minnen, J. Ballé and G. Toderici: "Joint autoregressive and hierarchical priors for learned image compression," International Conference on Neural Information Processing Systems (2018
12) D. Minnen and S. Singh: "Channel-wise autoregressive entropy models for learned image compression,". IEEE International Conference on Image Processing (2020
17) J. Ballé, V. Laparra and E.P. Simoncelli: "Density modeling of images using a generalized normalization transformation," International Conference on Learning Representations (2016
14) T. Chen, H. L., Z. Ma, Q. Shen, X. Cao and Y. Wang: "End-to-end learnt image compression via non-local attention optimization and improved context modeling," IEEE Transactions on Image Processing, vol. 30, pp.3179-3191 (2021
5) T. Boutell: RFC 2083 - PNG (Portable Network Graphics) Specification Version 1.0. Internet Engineering Task Force (1997
15) Z. Cheng, H. Sun, M. Takeuchi and J. Katto: "Learned image compression with discretized gaussian mixture likelihoods and attention modules," IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020
3) Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, Stephen Lin and Baining Guo: "Swin transformer: Hierarchical vision transformer using shifted windows," IEEE/CVF International Conference on Computer Vision (2021
9) AV1 Image File Format (AVIF), https://aomediacoDec.github.io/av1-avif
10) J. Ballé, D. Minnen, S. Singh, S.J. Hwang and N. Johnston: "Variational image compression with a scale hyperprior," International Conference on Learning Representations (2018
16) J. Liu, G. Lu, Z. Hu and D. Xu: "A unified end-to-end framework for efficient deep image compression," arXiv preprint arXiv:2002.03370 (2020
21) Y. Inazu: Masked-Kodak-Dataset, https://github.com/Yoshiki172/Masked-Kodak-dataset
13) H. Akutsu and N. Takahiro: "End to End Learned ROI Image Compression," IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019
2) R. Zou, C. Song and Z. Zhang: "The devil is in the details: Window-based attention for image compression," IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022
7) G.J. Sullivan, J. Ohm, W. Han and T. Wiegand: "Overview of the High Efficiency Video Coding (HEVC) Standard," IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp.1649-1668 (Dec. 2012
20) Kodak Lossless True Color Image Suite, http://r0k.us/graphics/kodak
22) D.P. Kingma and J. Ba: "Adam: A method for stochastic optimization," International Conference on Learning Representations (2015
23) libavif, https://github.com/AOMediaCodec/libavif
4) J. Liu, H. Sun and J. Katto: "Learned image compression with mixed transformer-cnn architectures," IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023
1) J. Ballé, V. Laparra and E.P. Simoncelli: "End-to-end optimized image compression," 5th International Conference on Learning Representations (2017
18) T.-Y. Lin, M. Maire, S; Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C.L. Zitnick and P. Dollár: "Microsoft COCO: Common Objects in Context,". European Conference on Computer Vision (2014
6) F. Bellard: BPG image format, https://bellard.org/bpg
11
22
12
23
13
14
15
16
17
18
19
1
2
3
4
5
6
7
8
9
20
10
21
References_xml – reference: 8) Y. Inazu and H. Kimata: "Study on Learned RGBA Image Compression Using Loss Function Based on Alpha Channel," International Conference on Image, Video and Signal Processing (2024)
– reference: 1) J. Ballé, V. Laparra and E.P. Simoncelli: "End-to-end optimized image compression," 5th International Conference on Learning Representations (2017)
– reference: 3) Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, Stephen Lin and Baining Guo: "Swin transformer: Hierarchical vision transformer using shifted windows," IEEE/CVF International Conference on Computer Vision (2021)
– reference: 2) R. Zou, C. Song and Z. Zhang: "The devil is in the details: Window-based attention for image compression," IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)
– reference: 11) D. Minnen, J. Ballé and G. Toderici: "Joint autoregressive and hierarchical priors for learned image compression," International Conference on Neural Information Processing Systems (2018)
– reference: 9) AV1 Image File Format (AVIF), https://aomediacoDec.github.io/av1-avif/
– reference: 22) D.P. Kingma and J. Ba: "Adam: A method for stochastic optimization," International Conference on Learning Representations (2015)
– reference: 17) J. Ballé, V. Laparra and E.P. Simoncelli: "Density modeling of images using a generalized normalization transformation," International Conference on Learning Representations (2016)
– reference: 18) T.-Y. Lin, M. Maire, S; Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C.L. Zitnick and P. Dollár: "Microsoft COCO: Common Objects in Context,". European Conference on Computer Vision (2014)
– reference: 4) J. Liu, H. Sun and J. Katto: "Learned image compression with mixed transformer-cnn architectures," IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)
– reference: 12) D. Minnen and S. Singh: "Channel-wise autoregressive entropy models for learned image compression,". IEEE International Conference on Image Processing (2020)
– reference: 16) J. Liu, G. Lu, Z. Hu and D. Xu: "A unified end-to-end framework for efficient deep image compression," arXiv preprint arXiv:2002.03370 (2020)
– reference: 14) T. Chen, H. L., Z. Ma, Q. Shen, X. Cao and Y. Wang: "End-to-end learnt image compression via non-local attention optimization and improved context modeling," IEEE Transactions on Image Processing, vol. 30, pp.3179-3191 (2021)
– reference: 21) Y. Inazu: Masked-Kodak-Dataset, https://github.com/Yoshiki172/Masked-Kodak-dataset/
– reference: 5) T. Boutell: RFC 2083 - PNG (Portable Network Graphics) Specification Version 1.0. Internet Engineering Task Force (1997)
– reference: 6) F. Bellard: BPG image format, https://bellard.org/bpg/
– reference: 15) Z. Cheng, H. Sun, M. Takeuchi and J. Katto: "Learned image compression with discretized gaussian mixture likelihoods and attention modules," IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
– reference: 10) J. Ballé, D. Minnen, S. Singh, S.J. Hwang and N. Johnston: "Variational image compression with a scale hyperprior," International Conference on Learning Representations (2018)
– reference: 19) J. Li, S. Ma, J. Zhang and D. Tao: "Privacy-Preserving Portrait Matting,"ACM International Conference on Multimedia (2021)
– reference: 20) Kodak Lossless True Color Image Suite, http://r0k.us/graphics/kodak/
– reference: 7) G.J. Sullivan, J. Ohm, W. Han and T. Wiegand: "Overview of the High Efficiency Video Coding (HEVC) Standard," IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp.1649-1668 (Dec. 2012)
– reference: 13) H. Akutsu and N. Takahiro: "End to End Learned ROI Image Compression," IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)
– reference: 23) libavif, https://github.com/AOMediaCodec/libavif
– ident: 17
– ident: 18
– ident: 14
  doi: 10.1109/TIP.2021.3058615
– ident: 2
  doi: 10.1109/CVPR52688.2022.01697
– ident: 1
– ident: 11
– ident: 7
  doi: 10.1109/TCSVT.2012.2221191
– ident: 4
  doi: 10.1109/CVPR52729.2023.01383
– ident: 10
– ident: 19
– ident: 13
– ident: 15
  doi: 10.1109/CVPR42600.2020.00796
– ident: 16
– ident: 8
  doi: 10.1145/3655755.3655769
– ident: 12
  doi: 10.1109/ICIP40778.2020.9190935
– ident: 5
  doi: 10.17487/rfc2083
– ident: 3
  doi: 10.1109/ICCV48922.2021.00986
– ident: 6
– ident: 9
– ident: 21
– ident: 20
– ident: 22
– ident: 23
SSID ssj0003220642
Score 2.2925184
Snippet RGBA image that includes an alpha channel for transparency is common in real-world applications. Traditional RGBA compression methods apply the same methods to...
SourceID crossref
jstage
SourceType Index Database
Publisher
StartPage 200
SubjectTerms alpha channel
deep learning
image compression
masked window-based attention
RGBA
Title [Paper] Deep Learning-based RGBA Image Compression with Masked Window-based Attention
URI https://www.jstage.jst.go.jp/article/mta/13/2/13_200/_article/-char/en
Volume 13
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
ispartofPNX ITE Transactions on Media Technology and Applications, 2025, Vol.13(2), pp.200-210
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Nb9MwFLfK4MAFbWITA4YsMU5RShI7dnrMxka3aaVsrZiEUGUnDpRpbcVSIe2v57048bKPw9glqlzbVf1-fv49530Qsi244hjhCPs7kD6XufC1MtKXcRGDLaYineGF_vFA9Mf88Cw-63SKltfSstTd7OreuJLHSBXaQK4YJfsfknWTQgN8BvnCEyQMzwfJ-EO8M1QLWPT4EygOs2iypf708XDKvZPPO6l3cIFuObjvrcurjdv2jtXlOXT5Bjb5_G_dPy1L6_3Ypqyg27zRSTo4tf4mp94XmxQy9UZ7u_1B9SKqSlKVDocuLtkhbqaulpWan1_-mp5Pr1_5A1OueGt_mhtVf1HfPtgYZauesJiVL5nNQd4197Q1-pW1cBS1lWUQtM7dyLq33lbpLBSYEfWiVN2Qdd2Qdt7sW-eZ8zIE-wZHT2DsJGRYcPMJeRqBOYGVLo6-Ju4uDpQa2mFYh7D5AzaTLQ7_eP3TN7jLs99A3xvXv4qNjFbJi9qMoKnFxBrpmNlLMv5eoeEHRSzQm1igiAVaYYG2sEARC9RigbaxQB0W1sl4f2-02_frshl-BsZ34Oe6pyRXoVBApxMgI4WWWShMIVliZJyZKDAaeYgswlCFjOVxYLAunOAFlvBkG2RlNp-ZV4T2sJiBiKXRCec9DRMnqhA8ATs0SsB63yTvmwWZLGx2lMndVd8kwq6V61PvGNen7ujaMeIQNvjrB03_hjxHZNp7sbdkpfyzNFvAFEv9rhLzP2rCZM0
linkProvider Colorado Alliance of Research Libraries
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%5BPaper%5D+Deep+Learning-based+RGBA+Image+Compression+with+Masked+Window-based+Attention&rft.jtitle=ITE+TRANSACTIONS+ON+MEDIA+TECHNOLOGY+AND+APPLICATIONS&rft.au=Inazu%2C+Yoshiki&rft.au=Kimata%2C+Hideaki&rft.date=2025&rft.issn=2186-7364&rft.eissn=2186-7364&rft.volume=13&rft.issue=2&rft.spage=200&rft.epage=210&rft_id=info:doi/10.3169%2Fmta.13.200&rft.externalDBID=n%2Fa&rft.externalDocID=10_3169_mta_13_200
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2186-7364&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2186-7364&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2186-7364&client=summon