• Paper
  • Engineering and Numerical Tools

Graph-based framework for temporal human action recognition and segmentation in industrial context

Article : Articles dans des revues sans comité de lecture

Industry 5.0 places human operators at the center of industrial processes. In this context, analyzing human movements has become crucial for ensuring operator safety and improving productivity. More specifically, an accurate system for action recognition and segmentation is essential to identify and break down each action an operator performs. These systems enable a range of applications, such as detecting fatigue and ergonomic risks, and providing targeted feedback to optimize workstations and task organization. However, existing action segmentation systems face challenges that limit their deployment in real industrial environments, including the complexity of industrial tasks and the similarity of assembly gestures. To address this issue, this work presents a system explicitly designed for industrial assembly environments that accurately segments and recognizes operator actions. It uses skeletal data and leverages graph-based representations to capture operator posture and model spatio-temporal movement patterns. It employs an encoder–decoder architecture, enhanced with an attention mechanism and an aggregation block, to effectively learn action dynamics. Additionally, it integrates decoder outputs at multiple temporal resolutions to precisely detect the start and end of actions. The proposed approach achieved competitive Mean Over Frame accuracies of 74.59% on the Industrial Human Action Recognition Dataset and 93.11% on the Human Action Multi-Modal Monitoring in Manufacturing dataset. Extensive experiments and ablation studies were conducted to provide deeper insights into its strengths. Visualizations are presented to illustrate the method’s handling of complex assembly video segmentation while also revealing specific cases for further improvement.