Enhancing the association in multi‐object tracking via neighbor graph

Most modern multi‐object tracking (MOT) systems for videos follow the tracking‐by‐detection paradigm, where objects of interest are first located in each frame then associated correspondingly to form their intact trajectories. In this setting, the appearance features of objects usually provide the m...

Full description

Saved in:

Bibliographic Details
Published in	International journal of intelligent systems Vol. 36; no. 11; pp. 6713 - 6730
Main Authors	Liang, Tianyi, Lan, Long, Zhang, Xiang, Peng, Xindong, Luo, Zhigang
Format	Journal Article
Language	English
Published	New York Hindawi Limited 01.11.2021
Subjects	data association graph convolutional networks Intelligent systems multi‐object tracking Object recognition Switches Tracking
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Most modern multi‐object tracking (MOT) systems for videos follow the tracking‐by‐detection paradigm, where objects of interest are first located in each frame then associated correspondingly to form their intact trajectories. In this setting, the appearance features of objects usually provide the most important cues for data association, but it is very susceptible to occlusions, illumination variations, and inaccurate detections, thus easily resulting in incorrect trajectories. To address this issue, in this study we propose to make full use of the neighboring information. Our motivations derive from the observations that people tend to move in a group. As such, when an individual target's appearance is remarkably changed, the observer can still identify it with its neighbor context. To model the contextual information from neighbors, we first utilize the spatiotemporal relations among trajectories to efficiently select suitable neighbors for targets. Subsequently, we construct neighbor graph for each target and corresponding neighbors then employ the graph convolutional networks (GCNs) to model their relations and learn the graph features. To the best of our knowledge, it is the first time to explicitly leverage neighbor cues via GCN in MOT. Finally, standardized evaluations on the MOT16 and MOT17 data sets demonstrate that our approach can remarkably reduce the identity switches whilst achieve state‐of‐the‐art overall performance.
ISSN:	0884-8173 1098-111X
DOI:	10.1002/int.22565