Sports video temporal action detection technology based on an improved MSST algorithm

Sports videos contain a large number of irrelevant backgrounds and static frames, which affect the efficiency and accuracy of temporal action detection. To optimize sports video data processing and temporal action detection, an improved multi-level spatiotemporal transformer network model is propose...

Full description

Saved in:

Bibliographic Details
Published in	Nonlinear engineering Vol. 14; no. 1; pp. 1123 - 33
Main Authors	Lai, Lixin, Fang, Yu
Format	Journal Article
Language	English
Published	Berlin De Gruyter 11.07.2025 Walter de Gruyter GmbH
Subjects	Accuracy Data processing Feature extraction feature pyramid network Modules multiple time scales spatiotemporal transformer sports videos temporal action detection Video data
Online Access	Get full text
ISSN	2192-8029 2192-8010 2192-8029
DOI	10.1515/nleng-2025-0143

Cover

Loading…

More Information
Summary:	Sports videos contain a large number of irrelevant backgrounds and static frames, which affect the efficiency and accuracy of temporal action detection. To optimize sports video data processing and temporal action detection, an improved multi-level spatiotemporal transformer network model is proposed. The model first optimizes the initial feature extraction of videos through an unsupervised video data preprocessing model based on deep residual networks. Subsequently, multi-scale features are generated through feature pyramid networks. The global spatiotemporal dependencies of actions are captured by a spatiotemporal encoder. The frame-level self-attention module further extracts keyframes and highlights temporal features, thereby improving detection accuracy. The accuracy of the proposed model was 0.6 at the beginning. After 300 iterations, the accuracy was 0.85. After 500 iterations, the highest accuracy was close to 0.9. The mAP of the improved model on the dataset reached 90.5%, which was higher than the 78.2% of the base model. The recall rate was 92.0%, the precision was 89.5%, and the calculation time was 220 ms. Meanwhile, the model shows balanced performance in detecting movements of different types of sports, especially in recognizing complex movements such as gymnastics and diving. This model effectively improves the efficiency and accuracy of time action detection through the collaborative action of multiple modules, demonstrating good applicability and robustness.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2192-8029 2192-8010 2192-8029
DOI:	10.1515/nleng-2025-0143