UVCGAN: UNet Vision Transformer cycle-consistent GAN for unpaired image-to-image translation

Unpaired image-to-image translation has broad applications in art, design, and scientific simulations. One early breakthrough was CycleGAN that emphasizes one-to-one mappings between two unpaired image domains via generative-adversarial networks (GAN) coupled with the cycle-consistency constraint, w...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings / IEEE Workshop on Applications of Computer Vision pp. 702 - 712
Main Authors	Torbunov, Dmitrii, Huang, Yi, Yu, Haiwang, Huang, Jin, Yoo, Shinjae, Lin, Meifeng, Viren, Brett, Ren, Yihui
Format	Conference Proceeding
Language	English
Published	IEEE 01.01.2023
Subjects	accountable Algorithms: Computational photography and algorithms (including transfer and un-supervised learning Benchmark testing Computational modeling Correlation ethical computer vision Explainable fair formulations image and video synthesis Inspection low-shot Machine learning architectures privacy-preserving self semi Source coding Training Transformers
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Unpaired image-to-image translation has broad applications in art, design, and scientific simulations. One early breakthrough was CycleGAN that emphasizes one-to-one mappings between two unpaired image domains via generative-adversarial networks (GAN) coupled with the cycle-consistency constraint, while more recent works promote one-to-many mapping to boost diversity of the translated images. Motivated by scientific simulation and one-to-one needs, this work revisits the classic CycleGAN framework and boosts its performance to outperform more contemporary models without relaxing the cycle-consistency constraint. To achieve this, we equip the generator with a Vision Transformer (ViT) and employ necessary training and regularization techniques. Compared to previous best-performing models, our model performs better and retains a strong correlation between the original and translated image. An accompanying ablation study shows that both the gradient penalty and self-supervised pre-training are crucial to the improvement. To promote reproducibility and open science, the source code, hyperparameter configurations, and pre-trained model are available at https://github.com/LS4GAN/uvcgan.
ISSN:	2642-9381
DOI:	10.1109/WACV56688.2023.00077