Multi-modal temporal action segmentation for manufacturing scenarios
Industrial robots have become prevalent in manufacturing due to their advantages of accuracy, speed, and reduced operator fatigue. Nevertheless, human operators play a crucial role in primary production lines. This study focuses on the temporal segmentation of human actions, aiming to identify the p...
Saved in:
Published in | Engineering applications of artificial intelligence Vol. 148; p. 110320 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
15.05.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Industrial robots have become prevalent in manufacturing due to their advantages of accuracy, speed, and reduced operator fatigue. Nevertheless, human operators play a crucial role in primary production lines. This study focuses on the temporal segmentation of human actions, aiming to identify the physical and cognitive behavior of operators working alongside collaborative robots. While existing literature explores temporal action segmentation datasets, there is a lack of evaluation for manufacturing tasks. This work assesses six state-of-the-art action segmentation models using the Human Action Multi-Modal Monitoring in Manufacturing (HA4M) dataset, where subjects assemble an industrial object in realistic manufacturing scenarios. By employing Cross-Subject and Cross-Location approaches, the study not only demonstrates the effectiveness of these models in industrial settings but also introduces a new benchmark for evaluating generalization across different subjects and locations. The evaluation further includes new videos in simulated industrial locations, assessed with both fully and semi-supervised learning approaches. The findings reveal that the Multi-Stage Temporal Convolutional Network ++ (MS-TCN++) and the Action Segmentation Transformer (ASFormer) architectures exhibit high performance in supervised and semi-supervised learning settings, also using new data, particularly when trained with Skeletal features, advancing the capabilities of temporal action segmentation in real-world manufacturing environments. This research lays the foundation for addressing video activity understanding challenges in manufacturing and presents opportunities for future investigations.
[Display omitted]
•I3D and Skeletal features extracted from the HA4M dataset for TAS in manufacturing.•Different set splits according to subjects and settings to understand feature reliability.•Fully and semi-supervised learning approaches to assess the behavior of the models. |
---|---|
ISSN: | 0952-1976 |
DOI: | 10.1016/j.engappai.2025.110320 |