Classification Matters: Improving Video Action Detection with Class-Specific Attention
Video action detection (VAD) aims to detect actors and classify their actions in a video. We figure that VAD suffers more from classification rather than localization of actors. Hence, we analyze how prevailing methods form features for classification and find that they prioritize actor regions, yet...
Saved in:
Main Authors | , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
29.07.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Video action detection (VAD) aims to detect actors and classify their actions
in a video. We figure that VAD suffers more from classification rather than
localization of actors. Hence, we analyze how prevailing methods form features
for classification and find that they prioritize actor regions, yet often
overlooking the essential contextual information necessary for accurate
classification. Accordingly, we propose to reduce the bias toward actor and
encourage paying attention to the context that is relevant to each action
class. By assigning a class-dedicated query to each action class, our model can
dynamically determine where to focus for effective classification. The proposed
model demonstrates superior performance on three challenging benchmarks with
significantly fewer parameters and less computation. |
---|---|
DOI: | 10.48550/arxiv.2407.19698 |