Effective Local-Global Transformer for Natural Image Matting

Learning-based matting methods have been dominated by convolution neural networks for a long time. These methods mainly propagate the alpha matte according to the similarity between unknown and known regions. However, correlations between pixels in unknown and known regions are limited due to the in...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on circuits and systems for video technology Vol. 33; no. 8; p. 1
Main Authors Hu, Liangpeng, Kong, Yating, Li, Jide, Li, Xiaoqiang
Format Journal Article
LanguageEnglish
Published New York IEEE 01.08.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Learning-based matting methods have been dominated by convolution neural networks for a long time. These methods mainly propagate the alpha matte according to the similarity between unknown and known regions. However, correlations between pixels in unknown and known regions are limited due to the insufficient receptive fields of common convolution neural networks, which leads to inaccurate estimation for pixels in unknown regions that are far away from known regions. In this paper, we propose an Effective Local-Global Transformer for natural image matting (ELGT-Matting), which can further expand receptive fields to establish a wide range of correlations between unknown and known regions. The kernel module is the effective local-global transformer block, and each block consists of two modules: 1) A Window-Level Global MSA (Multi-head Self-Attention) module, which learns global context features among windows. 2) A Local-Global Window MSA, which combines coarse global context features and corresponding fine local window features to help local window self-attention capture both local and context information. Experiments demonstrate that our ELGT-Matting performs outstandingly against other competitive approaches on Composition-1K, Distinctions-646, and real-world AIM-500 datasets. In particular, we achieve a new SOTA result on Composition-1K with MSE 0.00374.
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2023.3234983