Multi-scale Motion Feature Integration for Action Recognition

Analyzing video data with intricate temporal structures and extracting comprehensive motion information remains a significant challenge. In this work, we introduce the multi-scale motion feature integration (MMFI) network, which leverages two key modules for motion analysis: the progressive local co...

Full description

Saved in:
Bibliographic Details
Published in2023 9th International Conference on Computer and Communications (ICCC) pp. 1776 - 1781
Main Authors Lai, Jinming, Zheng, Huicheng, Dang, Jisheng
Format Conference Proceeding
LanguageEnglish
Published IEEE 08.12.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Analyzing video data with intricate temporal structures and extracting comprehensive motion information remains a significant challenge. In this work, we introduce the multi-scale motion feature integration (MMFI) network, which leverages two key modules for motion analysis: the progressive local context aggregation (PLCA) module and the multi-scale motion excitation (MSME) module. The PLCA module captures frame-level motion details by incrementally processing frame-wise differences near the input frame in the early stages of the network. The MSME module provides motion-attentive channel weights in deeper layers with higher dimensions, incorporating short- and long-range segment-level motion information. These modules synergistically capture motion details across various scales. Our approach is evaluated on the large-scale video dataset Something-Something V1, yielding state-of-the-art performance with minimal computational overhead.
ISSN:2837-7109
DOI:10.1109/ICCC59590.2023.10507593