Optimizing Transformer for Large-Hole Image Inpainting
In recent years, leveraging Convolutional Neural Network (CNN) to optimize Transformer (called hybrid model) has achieved great progress in image inpainting. However, the slow growth of the effective receptive field of CNN in processing large-hole regions significantly limits the overall performance...
Saved in:
Published in | 2023 IEEE International Conference on Image Processing (ICIP) pp. 1180 - 1184 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
08.10.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In recent years, leveraging Convolutional Neural Network (CNN) to optimize Transformer (called hybrid model) has achieved great progress in image inpainting. However, the slow growth of the effective receptive field of CNN in processing large-hole regions significantly limits the overall performance. To alleviate this problem, this paper proposes a new Transformer-CNN-based hybrid framework (termed PUT+) by introducing the fast Fourier convolution (FFC) into the CNN-based refinement network. The proposed framework introduces an improved Patch-based Vector Quantized Variational Auto-Encoder (P-VQVAE+). The encoder transforms the masked region into non-overlapping patch-based unquantized feature vectors as the input of Un-Quantized Transformer (UQ-Transformer). The decoder restores the masked region from the predicted quantized features output by the UQ-Transformer while maintaining the unmasked region unchanged. Many experimental results show that the proposed method outperforms the state-of-the-art by a large margin, especially for image inpainting with large masked areas. The code is available at https://github.com/GZHU-DVL/PUTplus. |
---|---|
DOI: | 10.1109/ICIP49359.2023.10222218 |