Pedestrian Tracking Based on Receptive Field Improvement: A One-Shot Multi-Object Tracking Approach Based on Vision Sensors

Multi-object tracking (MOT) in video sequences has gradually become one of the most essential fields in computer vision tasks. As the use of two separate models for feature extraction in two-shot methods precludes them from achieving real-time inference, one-shot methods have been developed to accel...

Full description

Saved in:

Bibliographic Details
Published in	IEEE sensors journal p. 1
Main Authors	Li, Guofa, Ouyang, Delin, Chen, Xin, Chu, Wenbo, Lu, Bing, Zhang, Caizhi, Tang, Xiaolin, Guo, Gang
Format	Journal Article
Language	English
Published	IEEE 13.07.2023
Subjects	autonomous driving Classification algorithms deep learning Feature extraction Multi-object tracking Object detection one-shot Real-time systems receptive field Sensors Target tracking Task analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Multi-object tracking (MOT) in video sequences has gradually become one of the most essential fields in computer vision tasks. As the use of two separate models for feature extraction in two-shot methods precludes them from achieving real-time inference, one-shot methods have been developed to accelerate the running speed by integrating the position estimation of interested targets and extraction of the corresponding appearance embedding features. However, the one-shot MOT accuracy is not as good as that of the two-shot methods. In our study, we design a reinforced one-shot MOT system to further promote the MOT capability. First of all, for the problem of target scale transformation, a receptive field module that applies dilated convolutions to acquire diverse receptive fields is presented. Then, we design an attention mechanism network to extract channel and positional information. Finally, we combine the circle loss with Euclidean distance optimization and cross-entropy loss to enhance the learning of discriminative embeddings. Three public pedestrian tracking datasets are applied to verify the effectiveness and superiority of our presented algorithm. In particular, we achieve 79.0 MOTA, 78.4 IDF1 on the train set of MOT16 with 24.1 FPS inference speed on a NVIDIA GeForce RTX 3090 GPU.
ISSN:	1530-437X 1558-1748
DOI:	10.1109/JSEN.2023.3293519