Focus On Details: Online Multi-object Tracking with Diverse Fine-grained Representation
Discriminative representation is essential to keep a unique identifier for each target in Multiple object tracking (MOT). Some recent MOT methods extract features of the bounding box region or the center point as identity embeddings. However, when targets are occluded, these coarse-grained global re...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
28.02.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Discriminative representation is essential to keep a unique identifier for
each target in Multiple object tracking (MOT). Some recent MOT methods extract
features of the bounding box region or the center point as identity embeddings.
However, when targets are occluded, these coarse-grained global representations
become unreliable. To this end, we propose exploring diverse fine-grained
representation, which describes appearance comprehensively from global and
local perspectives. This fine-grained representation requires high feature
resolution and precise semantic information. To effectively alleviate the
semantic misalignment caused by indiscriminate contextual information
aggregation, Flow Alignment FPN (FAFPN) is proposed for multi-scale feature
alignment aggregation. It generates semantic flow among feature maps from
different resolutions to transform their pixel positions. Furthermore, we
present a Multi-head Part Mask Generator (MPMG) to extract fine-grained
representation based on the aligned feature maps. Multiple parallel branches of
MPMG allow it to focus on different parts of targets to generate local masks
without label supervision. The diverse details in target masks facilitate
fine-grained representation. Eventually, benefiting from a Shuffle-Group
Sampling (SGS) training strategy with positive and negative samples balanced,
we achieve state-of-the-art performance on MOT17 and MOT20 test sets. Even on
DanceTrack, where the appearance of targets is extremely similar, our method
significantly outperforms ByteTrack by 5.0% on HOTA and 5.6% on IDF1. Extensive
experiments have proved that diverse fine-grained representation makes Re-ID
great again in MOT. |
---|---|
DOI: | 10.48550/arxiv.2302.14589 |