Technical Report for Ego4D Long Term Action Anticipation Challenge 2023
In this report, we describe the technical details of our approach for the Ego4D Long-Term Action Anticipation Challenge 2023. The aim of this task is to predict a sequence of future actions that will take place at an arbitrary time or later, given an input video. To accomplish this task, we introduc...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
04.07.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In this report, we describe the technical details of our approach for the
Ego4D Long-Term Action Anticipation Challenge 2023. The aim of this task is to
predict a sequence of future actions that will take place at an arbitrary time
or later, given an input video. To accomplish this task, we introduce three
improvements to the baseline model, which consists of an encoder that generates
clip-level features from the video, an aggregator that integrates multiple
clip-level features, and a decoder that outputs Z future actions. 1) Model
ensemble of SlowFast and SlowFast-CLIP; 2) Label smoothing to relax order
constraints for future actions; 3) Constraining the prediction of the action
class (verb, noun) based on word co-occurrence. Our method outperformed the
baseline performance and recorded as second place solution on the public
leaderboard. |
---|---|
DOI: | 10.48550/arxiv.2307.01467 |