REAL-TIME DEEP NEURAL NETWORKS FOR MULTIPLE OBJECT TRACKING AND SEGMENTATION ON MONOCULAR VIDEO

The paper is devoted to the task of multiple objects tracking and segmentation on monocular video, which was obtained by the camera of unmanned ground vehicle. The authors investigate various architectures of deep neural networks for this task solution. Special attention is paid to deep models provi...

Full description

Saved in:

Bibliographic Details
Published in	International archives of the photogrammetry, remote sensing and spatial information sciences. Vol. XLIV-2/W1-2021; pp. 15 - 20
Main Authors	Basharov, I., Yudin, D.
Format	Journal Article Conference Proceeding
Language	English
Published	Gottingen Copernicus GmbH 15.04.2021 Copernicus Publications
Subjects	Airborne/spaceborne computers Algorithms Artificial neural networks Autonomous cars Cameras Computer vision Embedding Geometric constraints Image segmentation Modules Multiple target tracking Neural networks Onboard equipment Real time Unmanned ground vehicles Unmanned vehicles Vision systems Work platforms
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The paper is devoted to the task of multiple objects tracking and segmentation on monocular video, which was obtained by the camera of unmanned ground vehicle. The authors investigate various architectures of deep neural networks for this task solution. Special attention is paid to deep models providing inference in real time. The authors proposed an approach based on combining the modern SOLOv2 instance segmentation model, a neural network model for embedding generation for each found object, and a modified Hungarian tracking algorithm. The Hungarian algorithm was modified taking into account the geometric constraints on the positions of the found objects on the sequence of images. The investigated solution is a development and improvement of the state-of-the-art PointTrack method. The effectiveness of the proposed approach is demonstrated quantitatively and qualitatively on the popular KITTI MOTS dataset collected using the cameras of a driverless car. The software implementation of the approach was carried out. The acceleration of the procedure for the formation of a two-dimensional point cloud in the found image segment was done using the NVidia CUDA technology. At the same time, the proposed instance segmentation module provides a mean processing time of one image of 68 ms, the embedding and tracking module of 24 ms using the NVidia Tesla V100 GPU. This indicates that the proposed solution is promising for on-board computer vision systems for both unmanned vehicles and various robotic platforms.
ISSN:	2194-9034 1682-1750 2194-9034
DOI:	10.5194/isprs-archives-XLIV-2-W1-2021-15-2021