[Paper] Deep Learning-based RGBA Image Compression with Masked Window-based Attention
RGBA image that includes an alpha channel for transparency is common in real-world applications. Traditional RGBA compression methods apply the same methods to both RGB and alpha channel, but potentially leading to suboptimal results due to their different characteristics. This paper proposes a deep...
Saved in:
Published in | ITE TRANSACTIONS ON MEDIA TECHNOLOGY AND APPLICATIONS Vol. 13; no. 2; pp. 200 - 210 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
The Institute of Image Information and Television Engineers
2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | RGBA image that includes an alpha channel for transparency is common in real-world applications. Traditional RGBA compression methods apply the same methods to both RGB and alpha channel, but potentially leading to suboptimal results due to their different characteristics. This paper proposes a deep neural network that introduces attention modules individually suitable for RGB signals and alpha channel. The proposed method consists of two networks, one for the RGB signal and one for the alpha channel, with an appropriate attention module applied in each. In particular, a new attention module that focuses on the unmasked regions of the alpha channel is applied. In the evaluation, the proposed method is compared with a simple deep neural network with input and output layers extended from three to four channels and classical RGBA image compression methods. |
---|---|
AbstractList | RGBA image that includes an alpha channel for transparency is common in real-world applications. Traditional RGBA compression methods apply the same methods to both RGB and alpha channel, but potentially leading to suboptimal results due to their different characteristics. This paper proposes a deep neural network that introduces attention modules individually suitable for RGB signals and alpha channel. The proposed method consists of two networks, one for the RGB signal and one for the alpha channel, with an appropriate attention module applied in each. In particular, a new attention module that focuses on the unmasked regions of the alpha channel is applied. In the evaluation, the proposed method is compared with a simple deep neural network with input and output layers extended from three to four channels and classical RGBA image compression methods. |
Author | Inazu, Yoshiki Kimata, Hideaki |
Author_xml | – sequence: 1 fullname: Inazu, Yoshiki organization: Graduate School of Engineering, Kogakuin University – sequence: 2 fullname: Kimata, Hideaki organization: Graduate School of Engineering, Kogakuin University |
BookMark | eNpN0FFLwzAQB_AgE5xzL36CPAudSdMl7YMPc845mCji8EEkXNvr1rmmJQkMv72RjeHTHX9-HNz_kvRMa5CQa85GgsvstvEw4mIUM3ZG-jFPZaSETHr_9gsydG7LGBNxzGQS98nq8xU6tF_0AbGjSwRrarOOcnBY0rf5_YQuGlgjnbZNZ9G5ujV0X_sNfQb3HchHbcp2f_QT79H4QK7IeQU7h8PjHJDV4-x9-hQtX-aL6WQZFTzjLCrzDFQCXAJLx6lUaZWrgkuslEhRjQuMGeYsiTNVcQ5ciHLMMKQgk4opIcSA3BzuFrZ1zmKlO1s3YH80Z_qvEx060Vzo0EnAdwe8dT68dKJgfV3s8ESP_pQXG7AajfgFsbhrpw |
Cites_doi | 10.1109/TIP.2021.3058615 10.1109/CVPR52688.2022.01697 10.1109/TCSVT.2012.2221191 10.1109/CVPR52729.2023.01383 10.1109/CVPR42600.2020.00796 10.1145/3655755.3655769 10.1109/ICIP40778.2020.9190935 10.17487/rfc2083 10.1109/ICCV48922.2021.00986 |
ContentType | Journal Article |
Copyright | 2025 The Institute of Image Information and Television Engineers |
Copyright_xml | – notice: 2025 The Institute of Image Information and Television Engineers |
DBID | AAYXX CITATION |
DOI | 10.3169/mta.13.200 |
DatabaseName | CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
EISSN | 2186-7364 |
EndPage | 210 |
ExternalDocumentID | 10_3169_mta_13_200 article_mta_13_2_13_200_article_char_en |
GroupedDBID | ALMA_UNASSIGNED_HOLDINGS JSF JSH KQ8 OK1 RJT RZJ AAYXX CITATION |
ID | FETCH-LOGICAL-c1910-db9a74a16a0858678fb7c16ef738e75ce20eb04297f11a133d50ece2a64f07333 |
ISSN | 2186-7364 |
IngestDate | Sun Jul 06 05:09:30 EDT 2025 Thu May 08 13:50:30 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | true |
Issue | 2 |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c1910-db9a74a16a0858678fb7c16ef738e75ce20eb04297f11a133d50ece2a64f07333 |
OpenAccessLink | https://www.jstage.jst.go.jp/article/mta/13/2/13_200/_article/-char/en |
PageCount | 11 |
ParticipantIDs | crossref_primary_10_3169_mta_13_200 jstage_primary_article_mta_13_2_13_200_article_char_en |
PublicationCentury | 2000 |
PublicationDate | 2025 2025-00-00 |
PublicationDateYYYYMMDD | 2025-01-01 |
PublicationDate_xml | – year: 2025 text: 2025 |
PublicationDecade | 2020 |
PublicationTitle | ITE TRANSACTIONS ON MEDIA TECHNOLOGY AND APPLICATIONS |
PublicationTitleAlternate | MTA |
PublicationYear | 2025 |
Publisher | The Institute of Image Information and Television Engineers |
Publisher_xml | – name: The Institute of Image Information and Television Engineers |
References | 8) Y. Inazu and H. Kimata: "Study on Learned RGBA Image Compression Using Loss Function Based on Alpha Channel," International Conference on Image, Video and Signal Processing (2024 19) J. Li, S. Ma, J. Zhang and D. Tao: "Privacy-Preserving Portrait Matting,"ACM International Conference on Multimedia (2021 11) D. Minnen, J. Ballé and G. Toderici: "Joint autoregressive and hierarchical priors for learned image compression," International Conference on Neural Information Processing Systems (2018 12) D. Minnen and S. Singh: "Channel-wise autoregressive entropy models for learned image compression,". IEEE International Conference on Image Processing (2020 17) J. Ballé, V. Laparra and E.P. Simoncelli: "Density modeling of images using a generalized normalization transformation," International Conference on Learning Representations (2016 14) T. Chen, H. L., Z. Ma, Q. Shen, X. Cao and Y. Wang: "End-to-end learnt image compression via non-local attention optimization and improved context modeling," IEEE Transactions on Image Processing, vol. 30, pp.3179-3191 (2021 5) T. Boutell: RFC 2083 - PNG (Portable Network Graphics) Specification Version 1.0. Internet Engineering Task Force (1997 15) Z. Cheng, H. Sun, M. Takeuchi and J. Katto: "Learned image compression with discretized gaussian mixture likelihoods and attention modules," IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020 3) Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, Stephen Lin and Baining Guo: "Swin transformer: Hierarchical vision transformer using shifted windows," IEEE/CVF International Conference on Computer Vision (2021 9) AV1 Image File Format (AVIF), https://aomediacoDec.github.io/av1-avif 10) J. Ballé, D. Minnen, S. Singh, S.J. Hwang and N. Johnston: "Variational image compression with a scale hyperprior," International Conference on Learning Representations (2018 16) J. Liu, G. Lu, Z. Hu and D. Xu: "A unified end-to-end framework for efficient deep image compression," arXiv preprint arXiv:2002.03370 (2020 21) Y. Inazu: Masked-Kodak-Dataset, https://github.com/Yoshiki172/Masked-Kodak-dataset 13) H. Akutsu and N. Takahiro: "End to End Learned ROI Image Compression," IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019 2) R. Zou, C. Song and Z. Zhang: "The devil is in the details: Window-based attention for image compression," IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022 7) G.J. Sullivan, J. Ohm, W. Han and T. Wiegand: "Overview of the High Efficiency Video Coding (HEVC) Standard," IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp.1649-1668 (Dec. 2012 20) Kodak Lossless True Color Image Suite, http://r0k.us/graphics/kodak 22) D.P. Kingma and J. Ba: "Adam: A method for stochastic optimization," International Conference on Learning Representations (2015 23) libavif, https://github.com/AOMediaCodec/libavif 4) J. Liu, H. Sun and J. Katto: "Learned image compression with mixed transformer-cnn architectures," IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023 1) J. Ballé, V. Laparra and E.P. Simoncelli: "End-to-end optimized image compression," 5th International Conference on Learning Representations (2017 18) T.-Y. Lin, M. Maire, S; Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C.L. Zitnick and P. Dollár: "Microsoft COCO: Common Objects in Context,". European Conference on Computer Vision (2014 6) F. Bellard: BPG image format, https://bellard.org/bpg 11 22 12 23 13 14 15 16 17 18 19 1 2 3 4 5 6 7 8 9 20 10 21 |
References_xml | – reference: 8) Y. Inazu and H. Kimata: "Study on Learned RGBA Image Compression Using Loss Function Based on Alpha Channel," International Conference on Image, Video and Signal Processing (2024) – reference: 1) J. Ballé, V. Laparra and E.P. Simoncelli: "End-to-end optimized image compression," 5th International Conference on Learning Representations (2017) – reference: 3) Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, Stephen Lin and Baining Guo: "Swin transformer: Hierarchical vision transformer using shifted windows," IEEE/CVF International Conference on Computer Vision (2021) – reference: 2) R. Zou, C. Song and Z. Zhang: "The devil is in the details: Window-based attention for image compression," IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022) – reference: 11) D. Minnen, J. Ballé and G. Toderici: "Joint autoregressive and hierarchical priors for learned image compression," International Conference on Neural Information Processing Systems (2018) – reference: 9) AV1 Image File Format (AVIF), https://aomediacoDec.github.io/av1-avif/ – reference: 22) D.P. Kingma and J. Ba: "Adam: A method for stochastic optimization," International Conference on Learning Representations (2015) – reference: 17) J. Ballé, V. Laparra and E.P. Simoncelli: "Density modeling of images using a generalized normalization transformation," International Conference on Learning Representations (2016) – reference: 18) T.-Y. Lin, M. Maire, S; Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C.L. Zitnick and P. Dollár: "Microsoft COCO: Common Objects in Context,". European Conference on Computer Vision (2014) – reference: 4) J. Liu, H. Sun and J. Katto: "Learned image compression with mixed transformer-cnn architectures," IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023) – reference: 12) D. Minnen and S. Singh: "Channel-wise autoregressive entropy models for learned image compression,". IEEE International Conference on Image Processing (2020) – reference: 16) J. Liu, G. Lu, Z. Hu and D. Xu: "A unified end-to-end framework for efficient deep image compression," arXiv preprint arXiv:2002.03370 (2020) – reference: 14) T. Chen, H. L., Z. Ma, Q. Shen, X. Cao and Y. Wang: "End-to-end learnt image compression via non-local attention optimization and improved context modeling," IEEE Transactions on Image Processing, vol. 30, pp.3179-3191 (2021) – reference: 21) Y. Inazu: Masked-Kodak-Dataset, https://github.com/Yoshiki172/Masked-Kodak-dataset/ – reference: 5) T. Boutell: RFC 2083 - PNG (Portable Network Graphics) Specification Version 1.0. Internet Engineering Task Force (1997) – reference: 6) F. Bellard: BPG image format, https://bellard.org/bpg/ – reference: 15) Z. Cheng, H. Sun, M. Takeuchi and J. Katto: "Learned image compression with discretized gaussian mixture likelihoods and attention modules," IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020) – reference: 10) J. Ballé, D. Minnen, S. Singh, S.J. Hwang and N. Johnston: "Variational image compression with a scale hyperprior," International Conference on Learning Representations (2018) – reference: 19) J. Li, S. Ma, J. Zhang and D. Tao: "Privacy-Preserving Portrait Matting,"ACM International Conference on Multimedia (2021) – reference: 20) Kodak Lossless True Color Image Suite, http://r0k.us/graphics/kodak/ – reference: 7) G.J. Sullivan, J. Ohm, W. Han and T. Wiegand: "Overview of the High Efficiency Video Coding (HEVC) Standard," IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp.1649-1668 (Dec. 2012) – reference: 13) H. Akutsu and N. Takahiro: "End to End Learned ROI Image Compression," IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019) – reference: 23) libavif, https://github.com/AOMediaCodec/libavif – ident: 17 – ident: 18 – ident: 14 doi: 10.1109/TIP.2021.3058615 – ident: 2 doi: 10.1109/CVPR52688.2022.01697 – ident: 1 – ident: 11 – ident: 7 doi: 10.1109/TCSVT.2012.2221191 – ident: 4 doi: 10.1109/CVPR52729.2023.01383 – ident: 10 – ident: 19 – ident: 13 – ident: 15 doi: 10.1109/CVPR42600.2020.00796 – ident: 16 – ident: 8 doi: 10.1145/3655755.3655769 – ident: 12 doi: 10.1109/ICIP40778.2020.9190935 – ident: 5 doi: 10.17487/rfc2083 – ident: 3 doi: 10.1109/ICCV48922.2021.00986 – ident: 6 – ident: 9 – ident: 21 – ident: 20 – ident: 22 – ident: 23 |
SSID | ssj0003220642 |
Score | 2.2925184 |
Snippet | RGBA image that includes an alpha channel for transparency is common in real-world applications. Traditional RGBA compression methods apply the same methods to... |
SourceID | crossref jstage |
SourceType | Index Database Publisher |
StartPage | 200 |
SubjectTerms | alpha channel deep learning image compression masked window-based attention RGBA |
Title | [Paper] Deep Learning-based RGBA Image Compression with Masked Window-based Attention |
URI | https://www.jstage.jst.go.jp/article/mta/13/2/13_200/_article/-char/en |
Volume | 13 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
ispartofPNX | ITE Transactions on Media Technology and Applications, 2025, Vol.13(2), pp.200-210 |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Nb9MwFLfK4MAFbWITA4YsMU5RShI7dnrMxka3aaVsrZiEUGUnDpRpbcVSIe2v57048bKPw9glqlzbVf1-fv49530Qsi244hjhCPs7kD6XufC1MtKXcRGDLaYineGF_vFA9Mf88Cw-63SKltfSstTd7OreuJLHSBXaQK4YJfsfknWTQgN8BvnCEyQMzwfJ-EO8M1QLWPT4EygOs2iypf708XDKvZPPO6l3cIFuObjvrcurjdv2jtXlOXT5Bjb5_G_dPy1L6_3Ypqyg27zRSTo4tf4mp94XmxQy9UZ7u_1B9SKqSlKVDocuLtkhbqaulpWan1_-mp5Pr1_5A1OueGt_mhtVf1HfPtgYZauesJiVL5nNQd4197Q1-pW1cBS1lWUQtM7dyLq33lbpLBSYEfWiVN2Qdd2Qdt7sW-eZ8zIE-wZHT2DsJGRYcPMJeRqBOYGVLo6-Ju4uDpQa2mFYh7D5AzaTLQ7_eP3TN7jLs99A3xvXv4qNjFbJi9qMoKnFxBrpmNlLMv5eoeEHRSzQm1igiAVaYYG2sEARC9RigbaxQB0W1sl4f2-02_frshl-BsZ34Oe6pyRXoVBApxMgI4WWWShMIVliZJyZKDAaeYgswlCFjOVxYLAunOAFlvBkG2RlNp-ZV4T2sJiBiKXRCec9DRMnqhA8ATs0SsB63yTvmwWZLGx2lMndVd8kwq6V61PvGNen7ujaMeIQNvjrB03_hjxHZNp7sbdkpfyzNFvAFEv9rhLzP2rCZM0 |
linkProvider | Colorado Alliance of Research Libraries |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%5BPaper%5D+Deep+Learning-based+RGBA+Image+Compression+with+Masked+Window-based+Attention&rft.jtitle=ITE+TRANSACTIONS+ON+MEDIA+TECHNOLOGY+AND+APPLICATIONS&rft.au=Inazu%2C+Yoshiki&rft.au=Kimata%2C+Hideaki&rft.date=2025&rft.issn=2186-7364&rft.eissn=2186-7364&rft.volume=13&rft.issue=2&rft.spage=200&rft.epage=210&rft_id=info:doi/10.3169%2Fmta.13.200&rft.externalDBID=n%2Fa&rft.externalDocID=10_3169_mta_13_200 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2186-7364&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2186-7364&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2186-7364&client=summon |