Rethinking the Truly Unsupervised Image-to-Image Translation

Every recent image-to-image translation model inherently requires either image-level (i.e. input-output pairs) or set-level (i.e. domain labels) supervision. However, even set-level supervision can be a severe bottleneck for data collection in practice. In this paper, we tackle image-to-image transl...

Full description

Saved in:

Bibliographic Details
Published in	2021 IEEE/CVF International Conference on Computer Vision (ICCV) pp. 14134 - 14143
Main Authors	Baek, Kyungjune, Choi, Yunjey, Uh, Youngjung, Yoo, Jaejun, Shim, Hyunjung
Format	Conference Proceeding
Language	English
Published	IEEE 01.01.2021
Subjects	Codes Computational modeling Computer vision Data collection Image and video synthesis Representation learning Semisupervised learning Task analysis Transfer/Low-shot/Semi/Unsupervised Learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Every recent image-to-image translation model inherently requires either image-level (i.e. input-output pairs) or set-level (i.e. domain labels) supervision. However, even set-level supervision can be a severe bottleneck for data collection in practice. In this paper, we tackle image-to-image translation in a fully unsupervised setting, i.e., neither paired images nor domain labels. To this end, we propose a truly unsupervised image-to-image translation model (TUNIT) that simultaneously learns to separate image domains and translates input images into the estimated domains. Experimental results show that our model achieves comparable or even better performance than the set-level supervised model trained with full labels, generalizes well on various datasets, and is robust against the choice of hyperparameters (e.g. the preset number of pseudo domains). Furthermore, TUNIT can be easily extended to semi-supervised learning with a few labeled data.
ISSN:	2380-7504
DOI:	10.1109/ICCV48922.2021.01389