Rethink arbitrary style transfer with transformer and contrastive learning
Arbitrary style transfer holds widespread attention in research and boasts numerous practical applications. The existing methods, which either employ cross-attention to incorporate deep style attributes into content attributes or use adaptive normalization to adjust content features, fail to generat...
Saved in:
Published in | Computer vision and image understanding Vol. 241; p. 103951 |
---|---|
Main Authors | , , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Inc
01.04.2024
|
Subjects | |
Online Access | Get full text |
ISSN | 1077-3142 1090-235X |
DOI | 10.1016/j.cviu.2024.103951 |
Cover
Loading…
Summary: | Arbitrary style transfer holds widespread attention in research and boasts numerous practical applications. The existing methods, which either employ cross-attention to incorporate deep style attributes into content attributes or use adaptive normalization to adjust content features, fail to generate high-quality stylized images. In this paper, we introduce an innovative technique to improve the quality of stylized images. Firstly, we propose Style Consistency Instance Normalization (SCIN), a method to refine the alignment between content and style features. In addition, we have developed an Instance-based Contrastive Learning (ICL) approach designed to understand the relationships among various styles, thereby enhancing the quality of the resulting stylized images. Recognizing that VGG networks are more adept at extracting classification features and need to be better suited for capturing style features, we have also introduced the Perception Encoder (PE) to capture style features. Extensive experiments demonstrate that our proposed method generates high-quality stylized images and effectively prevents artifacts compared with the existing state-of-the-art methods.
•We propose a novel Style Consistency Instance Normalization (SCIN) to capture long-range and non-local style correlation. This can align the content feature with the style feature instead of the mean and variance computed by fixed VGG.•Considering existing methods always generate low-quality stylized images with artifacts or stylized images with semantic errors, we introduce a novel Instance-based Contrastive Learning (ICL) to learn stylization-to-stylization relation and remove artifacts.•We analyze the defects of attention-based arbitrary style transfer due to fixed VGG and propose a novel Perception Encoder (PE) that can capture style information and avoid paying too much attention on the remarkable classification feature of style images.•Compared to the state-of-the-art method, extensive experiments demonstrate our proposed method can learn detailed texture and global style correlation and remove artifacts. |
---|---|
ISSN: | 1077-3142 1090-235X |
DOI: | 10.1016/j.cviu.2024.103951 |