Temporal Consistency as Pretext Task in Unsupervised Domain Adaptation for Semantic Segmentation

Intelligent and autonomous robots (and vehicles) largely adopt computer vision systems to help in localization, navigation and obstacle avoidance tasks. By integrating different deep learning techniques, such as Object Detection and Image Semantic Segmentation, these systems achieve high accuracy in...

Full description

Saved in:

Bibliographic Details
Published in	Journal of intelligent & robotic systems Vol. 111; no. 1; p. 37
Main Authors	Barbosa, Felipe, Osório, Fernando
Format	Journal Article
Language	English
Published	Dordrecht Springer Netherlands 19.03.2025 Springer Nature B.V
Subjects	Adaptation Artificial Intelligence Computer vision Control Deep learning Electrical Engineering Engineering Image segmentation Mechanical Engineering Mechatronics Object recognition Obstacle avoidance Regular Paper Robotics Self-supervised learning Semantic segmentation Semantics Unsupervised learning Video data Vision systems Unsupervised domain adaptation Temporal consistency Review Semantic segmentation Self-supervised learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Intelligent and autonomous robots (and vehicles) largely adopt computer vision systems to help in localization, navigation and obstacle avoidance tasks. By integrating different deep learning techniques, such as Object Detection and Image Semantic Segmentation, these systems achieve high accuracy in the domain they were trained on. Nonetheless, robustly operating in different domains still poses a major challenge to vision-based perception. In this sense, Unsupervised Domain Adaptation (UDA) has recently gained momentum due to its importance to real-world applications. Specifically, it leverages the prompt availability of unlabeled data to design auxiliary sources of supervision to guide the knowledge transfer between domains. The advantages of such an approach are two-fold: avoiding going through exhaustive labeling processes, and enhancing adaptation performance. In this scenario, exploring temporal correlations in unlabeled video data stands as an interesting alternative, which has not yet been explored to its full potential. In this work, we propose a Self-supervised learning framework that employs Temporal Consistency from unlabeled video sequences as a pretext task for improving UDA for Semantic Segmentation (UDASS). A simple yet effective strategy, it has shown promising results in a real-to-real adaptation setting. Our results and discussions are expected to benefit both new and experienced researchers on the subject.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1573-0409 0921-0296 1573-0409
DOI:	10.1007/s10846-025-02220-9