REAL-TIME DEEP NEURAL NETWORKS FOR MULTIPLE OBJECT TRACKING AND SEGMENTATION ON MONOCULAR VIDEO

The paper is devoted to the task of multiple objects tracking and segmentation on monocular video, which was obtained by the camera of unmanned ground vehicle. The authors investigate various architectures of deep neural networks for this task solution. Special attention is paid to deep models provi...

Full description

Saved in:
Bibliographic Details
Published inInternational archives of the photogrammetry, remote sensing and spatial information sciences. Vol. XLIV-2/W1-2021; pp. 15 - 20
Main Authors Basharov, I., Yudin, D.
Format Journal Article Conference Proceeding
LanguageEnglish
Published Gottingen Copernicus GmbH 15.04.2021
Copernicus Publications
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The paper is devoted to the task of multiple objects tracking and segmentation on monocular video, which was obtained by the camera of unmanned ground vehicle. The authors investigate various architectures of deep neural networks for this task solution. Special attention is paid to deep models providing inference in real time. The authors proposed an approach based on combining the modern SOLOv2 instance segmentation model, a neural network model for embedding generation for each found object, and a modified Hungarian tracking algorithm. The Hungarian algorithm was modified taking into account the geometric constraints on the positions of the found objects on the sequence of images. The investigated solution is a development and improvement of the state-of-the-art PointTrack method. The effectiveness of the proposed approach is demonstrated quantitatively and qualitatively on the popular KITTI MOTS dataset collected using the cameras of a driverless car. The software implementation of the approach was carried out. The acceleration of the procedure for the formation of a two-dimensional point cloud in the found image segment was done using the NVidia CUDA technology. At the same time, the proposed instance segmentation module provides a mean processing time of one image of 68 ms, the embedding and tracking module of 24 ms using the NVidia Tesla V100 GPU. This indicates that the proposed solution is promising for on-board computer vision systems for both unmanned vehicles and various robotic platforms.
ISSN:2194-9034
1682-1750
2194-9034
DOI:10.5194/isprs-archives-XLIV-2-W1-2021-15-2021