Sports video temporal action detection technology based on an improved MSST algorithm

Sports videos contain a large number of irrelevant backgrounds and static frames, which affect the efficiency and accuracy of temporal action detection. To optimize sports video data processing and temporal action detection, an improved multi-level spatiotemporal transformer network model is propose...

Full description

Saved in:
Bibliographic Details
Published inNonlinear engineering Vol. 14; no. 1; pp. 1123 - 33
Main Authors Lai, Lixin, Fang, Yu
Format Journal Article
LanguageEnglish
Published Berlin De Gruyter 11.07.2025
Walter de Gruyter GmbH
Subjects
Online AccessGet full text
ISSN2192-8029
2192-8010
2192-8029
DOI10.1515/nleng-2025-0143

Cover

Loading…
More Information
Summary:Sports videos contain a large number of irrelevant backgrounds and static frames, which affect the efficiency and accuracy of temporal action detection. To optimize sports video data processing and temporal action detection, an improved multi-level spatiotemporal transformer network model is proposed. The model first optimizes the initial feature extraction of videos through an unsupervised video data preprocessing model based on deep residual networks. Subsequently, multi-scale features are generated through feature pyramid networks. The global spatiotemporal dependencies of actions are captured by a spatiotemporal encoder. The frame-level self-attention module further extracts keyframes and highlights temporal features, thereby improving detection accuracy. The accuracy of the proposed model was 0.6 at the beginning. After 300 iterations, the accuracy was 0.85. After 500 iterations, the highest accuracy was close to 0.9. The mAP of the improved model on the dataset reached 90.5%, which was higher than the 78.2% of the base model. The recall rate was 92.0%, the precision was 89.5%, and the calculation time was 220 ms. Meanwhile, the model shows balanced performance in detecting movements of different types of sports, especially in recognizing complex movements such as gymnastics and diving. This model effectively improves the efficiency and accuracy of time action detection through the collaborative action of multiple modules, demonstrating good applicability and robustness.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2192-8029
2192-8010
2192-8029
DOI:10.1515/nleng-2025-0143