Rethink arbitrary style transfer with transformer and contrastive learning

Arbitrary style transfer holds widespread attention in research and boasts numerous practical applications. The existing methods, which either employ cross-attention to incorporate deep style attributes into content attributes or use adaptive normalization to adjust content features, fail to generat...

Full description

Saved in:

Bibliographic Details
Published in	Computer vision and image understanding Vol. 241; p. 103951
Main Authors	Zhang, Zhanjie, Sun, Jiakai, Li, Guangyuan, Zhao, Lei, Zhang, Quanwei, Lan, Zehua, Yin, Haolin, Xing, Wei, Lin, Huaizhong, Zuo, Zhiwen
Format	Journal Article
Language	English
Published	Elsevier Inc 01.04.2024
Subjects	Arbitrary style transfer Contrastive learning Transformer Arbitrary style transfer Transformer Contrastive learning
Online Access	Get full text
ISSN	1077-3142 1090-235X
DOI	10.1016/j.cviu.2024.103951

Cover

Loading…

More Information
Summary:	Arbitrary style transfer holds widespread attention in research and boasts numerous practical applications. The existing methods, which either employ cross-attention to incorporate deep style attributes into content attributes or use adaptive normalization to adjust content features, fail to generate high-quality stylized images. In this paper, we introduce an innovative technique to improve the quality of stylized images. Firstly, we propose Style Consistency Instance Normalization (SCIN), a method to refine the alignment between content and style features. In addition, we have developed an Instance-based Contrastive Learning (ICL) approach designed to understand the relationships among various styles, thereby enhancing the quality of the resulting stylized images. Recognizing that VGG networks are more adept at extracting classification features and need to be better suited for capturing style features, we have also introduced the Perception Encoder (PE) to capture style features. Extensive experiments demonstrate that our proposed method generates high-quality stylized images and effectively prevents artifacts compared with the existing state-of-the-art methods. •We propose a novel Style Consistency Instance Normalization (SCIN) to capture long-range and non-local style correlation. This can align the content feature with the style feature instead of the mean and variance computed by fixed VGG.•Considering existing methods always generate low-quality stylized images with artifacts or stylized images with semantic errors, we introduce a novel Instance-based Contrastive Learning (ICL) to learn stylization-to-stylization relation and remove artifacts.•We analyze the defects of attention-based arbitrary style transfer due to fixed VGG and propose a novel Perception Encoder (PE) that can capture style information and avoid paying too much attention on the remarkable classification feature of style images.•Compared to the state-of-the-art method, extensive experiments demonstrate our proposed method can learn detailed texture and global style correlation and remove artifacts.
ISSN:	1077-3142 1090-235X
DOI:	10.1016/j.cviu.2024.103951