Multi-scale Motion Feature Integration for Action Recognition

Analyzing video data with intricate temporal structures and extracting comprehensive motion information remains a significant challenge. In this work, we introduce the multi-scale motion feature integration (MMFI) network, which leverages two key modules for motion analysis: the progressive local co...

Full description

Saved in:

Bibliographic Details
Published in	2023 9th International Conference on Computer and Communications (ICCC) pp. 1776 - 1781
Main Authors	Lai, Jinming, Zheng, Huicheng, Dang, Jisheng
Format	Conference Proceeding
Language	English
Published	IEEE 08.12.2023
Subjects	action recognition channel attention Computational modeling Computer architecture Data mining deep learning Feature extraction Motion segmentation multi-scale feature Residual neural networks temporal modeling
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Analyzing video data with intricate temporal structures and extracting comprehensive motion information remains a significant challenge. In this work, we introduce the multi-scale motion feature integration (MMFI) network, which leverages two key modules for motion analysis: the progressive local context aggregation (PLCA) module and the multi-scale motion excitation (MSME) module. The PLCA module captures frame-level motion details by incrementally processing frame-wise differences near the input frame in the early stages of the network. The MSME module provides motion-attentive channel weights in deeper layers with higher dimensions, incorporating short- and long-range segment-level motion information. These modules synergistically capture motion details across various scales. Our approach is evaluated on the large-scale video dataset Something-Something V1, yielding state-of-the-art performance with minimal computational overhead.
ISSN:	2837-7109
DOI:	10.1109/ICCC59590.2023.10507593