Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation
Video amodal segmentation is a particularly challenging task in computer vision, which requires to deduce the full shape of an object from the visible parts of it. Recently, some studies have achieved promising performance by using motion flow to integrate information across frames under a self-supe...
Saved in:
Main Authors | , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
23.09.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Video amodal segmentation is a particularly challenging task in computer
vision, which requires to deduce the full shape of an object from the visible
parts of it. Recently, some studies have achieved promising performance by
using motion flow to integrate information across frames under a
self-supervised setting. However, motion flow has a clear limitation by the two
factors of moving cameras and object deformation. This paper presents a
rethinking to previous works. We particularly leverage the supervised signals
with object-centric representation in \textit{real-world scenarios}. The
underlying idea is the supervision signal of the specific object and the
features from different views can mutually benefit the deduction of the full
mask in any specific frame. We thus propose an Efficient object-centric
Representation amodal Segmentation (EoRaS). Specially, beyond solely relying on
supervision signals, we design a translation module to project image features
into the Bird's-Eye View (BEV), which introduces 3D information to improve
current feature quality. Furthermore, we propose a multi-view fusion layer
based temporal module which is equipped with a set of object slots and
interacts with features from different views by attention mechanism to fulfill
sufficient object representation completion. As a result, the full mask of the
object can be decoded from image features updated by object slots. Extensive
experiments on both real-world and synthetic benchmarks demonstrate the
superiority of our proposed method, achieving state-of-the-art performance. Our
code will be released at \url{https://github.com/kfan21/EoRaS}. |
---|---|
DOI: | 10.48550/arxiv.2309.13248 |