[Paper] Deep Learning-based RGBA Image Compression with Masked Window-based Attention

RGBA image that includes an alpha channel for transparency is common in real-world applications. Traditional RGBA compression methods apply the same methods to both RGB and alpha channel, but potentially leading to suboptimal results due to their different characteristics. This paper proposes a deep...

Full description

Saved in:

Bibliographic Details
Published in	ITE TRANSACTIONS ON MEDIA TECHNOLOGY AND APPLICATIONS Vol. 13; no. 2; pp. 200 - 210
Main Authors	Inazu, Yoshiki, Kimata, Hideaki
Format	Journal Article
Language	English
Published	The Institute of Image Information and Television Engineers 2025
Subjects	alpha channel deep learning image compression masked window-based attention RGBA
Online Access	Get full text

Cover

Loading…

Abstract	RGBA image that includes an alpha channel for transparency is common in real-world applications. Traditional RGBA compression methods apply the same methods to both RGB and alpha channel, but potentially leading to suboptimal results due to their different characteristics. This paper proposes a deep neural network that introduces attention modules individually suitable for RGB signals and alpha channel. The proposed method consists of two networks, one for the RGB signal and one for the alpha channel, with an appropriate attention module applied in each. In particular, a new attention module that focuses on the unmasked regions of the alpha channel is applied. In the evaluation, the proposed method is compared with a simple deep neural network with input and output layers extended from three to four channels and classical RGBA image compression methods.
AbstractList	RGBA image that includes an alpha channel for transparency is common in real-world applications. Traditional RGBA compression methods apply the same methods to both RGB and alpha channel, but potentially leading to suboptimal results due to their different characteristics. This paper proposes a deep neural network that introduces attention modules individually suitable for RGB signals and alpha channel. The proposed method consists of two networks, one for the RGB signal and one for the alpha channel, with an appropriate attention module applied in each. In particular, a new attention module that focuses on the unmasked regions of the alpha channel is applied. In the evaluation, the proposed method is compared with a simple deep neural network with input and output layers extended from three to four channels and classical RGBA image compression methods.
Author	Inazu, Yoshiki Kimata, Hideaki
Author_xml	– sequence: 1 fullname: Inazu, Yoshiki organization: Graduate School of Engineering, Kogakuin University – sequence: 2 fullname: Kimata, Hideaki organization: Graduate School of Engineering, Kogakuin University
BookMark	eNpN0FFLwzAQB_AgE5xzL36CPAudSdMl7YMPc845mCji8EEkXNvr1rmmJQkMv72RjeHTHX9-HNz_kvRMa5CQa85GgsvstvEw4mIUM3ZG-jFPZaSETHr_9gsydG7LGBNxzGQS98nq8xU6tF_0AbGjSwRrarOOcnBY0rf5_YQuGlgjnbZNZ9G5ujV0X_sNfQb3HchHbcp2f_QT79H4QK7IeQU7h8PjHJDV4-x9-hQtX-aL6WQZFTzjLCrzDFQCXAJLx6lUaZWrgkuslEhRjQuMGeYsiTNVcQ5ciHLMMKQgk4opIcSA3BzuFrZ1zmKlO1s3YH80Z_qvEx060Vzo0EnAdwe8dT68dKJgfV3s8ESP_pQXG7AajfgFsbhrpw
Cites_doi	10.1109/TIP.2021.3058615 10.1109/CVPR52688.2022.01697 10.1109/TCSVT.2012.2221191 10.1109/CVPR52729.2023.01383 10.1109/CVPR42600.2020.00796 10.1145/3655755.3655769 10.1109/ICIP40778.2020.9190935 10.17487/rfc2083 10.1109/ICCV48922.2021.00986
ContentType	Journal Article
Copyright	2025 The Institute of Image Information and Television Engineers
Copyright_xml	– notice: 2025 The Institute of Image Information and Television Engineers
DBID	AAYXX CITATION
DOI	10.3169/mta.13.200
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
EISSN	2186-7364
EndPage	210
ExternalDocumentID	10_3169_mta_13_200 article_mta_13_2_13_200_article_char_en
GroupedDBID	ALMA_UNASSIGNED_HOLDINGS JSF JSH KQ8 OK1 RJT RZJ AAYXX CITATION
ID	FETCH-LOGICAL-c1910-db9a74a16a0858678fb7c16ef738e75ce20eb04297f11a133d50ece2a64f07333
ISSN	2186-7364
IngestDate	Sun Jul 06 05:09:30 EDT 2025 Thu May 08 13:50:30 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	true
Issue	2
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c1910-db9a74a16a0858678fb7c16ef738e75ce20eb04297f11a133d50ece2a64f07333
OpenAccessLink	https://www.jstage.jst.go.jp/article/mta/13/2/13_200/_article/-char/en
PageCount	11
ParticipantIDs	crossref_primary_10_3169_mta_13_200 jstage_primary_article_mta_13_2_13_200_article_char_en
PublicationCentury	2000
PublicationDate	2025 2025-00-00
PublicationDateYYYYMMDD	2025-01-01
PublicationDate_xml	– year: 2025 text: 2025
PublicationDecade	2020
PublicationTitle	ITE TRANSACTIONS ON MEDIA TECHNOLOGY AND APPLICATIONS
PublicationTitleAlternate	MTA
PublicationYear	2025
Publisher	The Institute of Image Information and Television Engineers
Publisher_xml	– name: The Institute of Image Information and Television Engineers
References	8) Y. Inazu and H. Kimata: "Study on Learned RGBA Image Compression Using Loss Function Based on Alpha Channel," International Conference on Image, Video and Signal Processing (2024 19) J. Li, S. Ma, J. Zhang and D. Tao: "Privacy-Preserving Portrait Matting,"ACM International Conference on Multimedia (2021 11) D. Minnen, J. Ballé and G. Toderici: "Joint autoregressive and hierarchical priors for learned image compression," International Conference on Neural Information Processing Systems (2018 12) D. Minnen and S. Singh: "Channel-wise autoregressive entropy models for learned image compression,". IEEE International Conference on Image Processing (2020 17) J. Ballé, V. Laparra and E.P. Simoncelli: "Density modeling of images using a generalized normalization transformation," International Conference on Learning Representations (2016 14) T. Chen, H. L., Z. Ma, Q. Shen, X. Cao and Y. Wang: "End-to-end learnt image compression via non-local attention optimization and improved context modeling," IEEE Transactions on Image Processing, vol. 30, pp.3179-3191 (2021 5) T. Boutell: RFC 2083 - PNG (Portable Network Graphics) Specification Version 1.0. Internet Engineering Task Force (1997 15) Z. Cheng, H. Sun, M. Takeuchi and J. Katto: "Learned image compression with discretized gaussian mixture likelihoods and attention modules," IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020 3) Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, Stephen Lin and Baining Guo: "Swin transformer: Hierarchical vision transformer using shifted windows," IEEE/CVF International Conference on Computer Vision (2021 9) AV1 Image File Format (AVIF), https://aomediacoDec.github.io/av1-avif 10) J. Ballé, D. Minnen, S. Singh, S.J. Hwang and N. Johnston: "Variational image compression with a scale hyperprior," International Conference on Learning Representations (2018 16) J. Liu, G. Lu, Z. Hu and D. Xu: "A unified end-to-end framework for efficient deep image compression," arXiv preprint arXiv:2002.03370 (2020 21) Y. Inazu: Masked-Kodak-Dataset, https://github.com/Yoshiki172/Masked-Kodak-dataset 13) H. Akutsu and N. Takahiro: "End to End Learned ROI Image Compression," IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019 2) R. Zou, C. Song and Z. Zhang: "The devil is in the details: Window-based attention for image compression," IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022 7) G.J. Sullivan, J. Ohm, W. Han and T. Wiegand: "Overview of the High Efficiency Video Coding (HEVC) Standard," IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp.1649-1668 (Dec. 2012 20) Kodak Lossless True Color Image Suite, http://r0k.us/graphics/kodak 22) D.P. Kingma and J. Ba: "Adam: A method for stochastic optimization," International Conference on Learning Representations (2015 23) libavif, https://github.com/AOMediaCodec/libavif 4) J. Liu, H. Sun and J. Katto: "Learned image compression with mixed transformer-cnn architectures," IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023 1) J. Ballé, V. Laparra and E.P. Simoncelli: "End-to-end optimized image compression," 5th International Conference on Learning Representations (2017 18) T.-Y. Lin, M. Maire, S; Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C.L. Zitnick and P. Dollár: "Microsoft COCO: Common Objects in Context,". European Conference on Computer Vision (2014 6) F. Bellard: BPG image format, https://bellard.org/bpg 11 22 12 23 13 14 15 16 17 18 19 1 2 3 4 5 6 7 8 9 20 10 21
References_xml	– reference: 8) Y. Inazu and H. Kimata: "Study on Learned RGBA Image Compression Using Loss Function Based on Alpha Channel," International Conference on Image, Video and Signal Processing (2024) – reference: 1) J. Ballé, V. Laparra and E.P. Simoncelli: "End-to-end optimized image compression," 5th International Conference on Learning Representations (2017) – reference: 3) Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, Stephen Lin and Baining Guo: "Swin transformer: Hierarchical vision transformer using shifted windows," IEEE/CVF International Conference on Computer Vision (2021) – reference: 2) R. Zou, C. Song and Z. Zhang: "The devil is in the details: Window-based attention for image compression," IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022) – reference: 11) D. Minnen, J. Ballé and G. Toderici: "Joint autoregressive and hierarchical priors for learned image compression," International Conference on Neural Information Processing Systems (2018) – reference: 9) AV1 Image File Format (AVIF), https://aomediacoDec.github.io/av1-avif/ – reference: 22) D.P. Kingma and J. Ba: "Adam: A method for stochastic optimization," International Conference on Learning Representations (2015) – reference: 17) J. Ballé, V. Laparra and E.P. Simoncelli: "Density modeling of images using a generalized normalization transformation," International Conference on Learning Representations (2016) – reference: 18) T.-Y. Lin, M. Maire, S; Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C.L. Zitnick and P. Dollár: "Microsoft COCO: Common Objects in Context,". European Conference on Computer Vision (2014) – reference: 4) J. Liu, H. Sun and J. Katto: "Learned image compression with mixed transformer-cnn architectures," IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) – reference: 12) D. Minnen and S. Singh: "Channel-wise autoregressive entropy models for learned image compression,". IEEE International Conference on Image Processing (2020) – reference: 16) J. Liu, G. Lu, Z. Hu and D. Xu: "A unified end-to-end framework for efficient deep image compression," arXiv preprint arXiv:2002.03370 (2020) – reference: 14) T. Chen, H. L., Z. Ma, Q. Shen, X. Cao and Y. Wang: "End-to-end learnt image compression via non-local attention optimization and improved context modeling," IEEE Transactions on Image Processing, vol. 30, pp.3179-3191 (2021) – reference: 21) Y. Inazu: Masked-Kodak-Dataset, https://github.com/Yoshiki172/Masked-Kodak-dataset/ – reference: 5) T. Boutell: RFC 2083 - PNG (Portable Network Graphics) Specification Version 1.0. Internet Engineering Task Force (1997) – reference: 6) F. Bellard: BPG image format, https://bellard.org/bpg/ – reference: 15) Z. Cheng, H. Sun, M. Takeuchi and J. Katto: "Learned image compression with discretized gaussian mixture likelihoods and attention modules," IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020) – reference: 10) J. Ballé, D. Minnen, S. Singh, S.J. Hwang and N. Johnston: "Variational image compression with a scale hyperprior," International Conference on Learning Representations (2018) – reference: 19) J. Li, S. Ma, J. Zhang and D. Tao: "Privacy-Preserving Portrait Matting,"ACM International Conference on Multimedia (2021) – reference: 20) Kodak Lossless True Color Image Suite, http://r0k.us/graphics/kodak/ – reference: 7) G.J. Sullivan, J. Ohm, W. Han and T. Wiegand: "Overview of the High Efficiency Video Coding (HEVC) Standard," IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp.1649-1668 (Dec. 2012) – reference: 13) H. Akutsu and N. Takahiro: "End to End Learned ROI Image Compression," IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019) – reference: 23) libavif, https://github.com/AOMediaCodec/libavif – ident: 17 – ident: 18 – ident: 14 doi: 10.1109/TIP.2021.3058615 – ident: 2 doi: 10.1109/CVPR52688.2022.01697 – ident: 1 – ident: 11 – ident: 7 doi: 10.1109/TCSVT.2012.2221191 – ident: 4 doi: 10.1109/CVPR52729.2023.01383 – ident: 10 – ident: 19 – ident: 13 – ident: 15 doi: 10.1109/CVPR42600.2020.00796 – ident: 16 – ident: 8 doi: 10.1145/3655755.3655769 – ident: 12 doi: 10.1109/ICIP40778.2020.9190935 – ident: 5 doi: 10.17487/rfc2083 – ident: 3 doi: 10.1109/ICCV48922.2021.00986 – ident: 6 – ident: 9 – ident: 21 – ident: 20 – ident: 22 – ident: 23
SSID	ssj0003220642
Score	2.2925184
Snippet	RGBA image that includes an alpha channel for transparency is common in real-world applications. Traditional RGBA compression methods apply the same methods to...
SourceID	crossref jstage
SourceType	Index Database Publisher
StartPage	200
SubjectTerms	alpha channel deep learning image compression masked window-based attention RGBA
Title	[Paper] Deep Learning-based RGBA Image Compression with Masked Window-based Attention
URI	https://www.jstage.jst.go.jp/article/mta/13/2/13_200/_article/-char/en
Volume	13
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
ispartofPNX	ITE Transactions on Media Technology and Applications, 2025, Vol.13(2), pp.200-210
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Nb9MwFLfK4MAFbWITA4YsMU5RShI7dnrMxka3aaVsrZiEUGUnDpRpbcVSIe2v57048bKPw9glqlzbVf1-fv49530Qsi244hjhCPs7kD6XufC1MtKXcRGDLaYineGF_vFA9Mf88Cw-63SKltfSstTd7OreuJLHSBXaQK4YJfsfknWTQgN8BvnCEyQMzwfJ-EO8M1QLWPT4EygOs2iypf708XDKvZPPO6l3cIFuObjvrcurjdv2jtXlOXT5Bjb5_G_dPy1L6_3Ypqyg27zRSTo4tf4mp94XmxQy9UZ7u_1B9SKqSlKVDocuLtkhbqaulpWan1_-mp5Pr1_5A1OueGt_mhtVf1HfPtgYZauesJiVL5nNQd4197Q1-pW1cBS1lWUQtM7dyLq33lbpLBSYEfWiVN2Qdd2Qdt7sW-eZ8zIE-wZHT2DsJGRYcPMJeRqBOYGVLo6-Ju4uDpQa2mFYh7D5AzaTLQ7_eP3TN7jLs99A3xvXv4qNjFbJi9qMoKnFxBrpmNlLMv5eoeEHRSzQm1igiAVaYYG2sEARC9RigbaxQB0W1sl4f2-02_frshl-BsZ34Oe6pyRXoVBApxMgI4WWWShMIVliZJyZKDAaeYgswlCFjOVxYLAunOAFlvBkG2RlNp-ZV4T2sJiBiKXRCec9DRMnqhA8ATs0SsB63yTvmwWZLGx2lMndVd8kwq6V61PvGNen7ujaMeIQNvjrB03_hjxHZNp7sbdkpfyzNFvAFEv9rhLzP2rCZM0
linkProvider	Colorado Alliance of Research Libraries
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%5BPaper%5D+Deep+Learning-based+RGBA+Image+Compression+with+Masked+Window-based+Attention&rft.jtitle=ITE+TRANSACTIONS+ON+MEDIA+TECHNOLOGY+AND+APPLICATIONS&rft.au=Inazu%2C+Yoshiki&rft.au=Kimata%2C+Hideaki&rft.date=2025&rft.issn=2186-7364&rft.eissn=2186-7364&rft.volume=13&rft.issue=2&rft.spage=200&rft.epage=210&rft_id=info:doi/10.3169%2Fmta.13.200&rft.externalDBID=n%2Fa&rft.externalDocID=10_3169_mta_13_200
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2186-7364&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2186-7364&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2186-7364&client=summon