STr-GCN: Dual Spatial Graph Convolutional Network and Transformer Graph Encoder for 3D Hand Gesture Recognition
Conférence : Communications avec actes dans un congrès international
Skeleton-based hand gesture recognition is a challenging
task that sparked a lot of attention in recent years,
especially with the rise of Graph Neural Networks. In this
paper, we propose a new deep learning architecture for hand
gesture recognition using 3D hand skeleton data and we call
STr-GCN. It decouples the spatial and temporal learning
of the gesture by leveraging Graph Convolutional Networks
(GCN) and Transformers. The key idea is to combine two
powerful networks: a Spatial Graph Convolutional Network
unit that understands intra-frame interactions to extract powerful
features from different hand joints and a Transformer
Graph Encoder which is based on a Temporal Self-Attention
module to incorporate inter-frame correlations. We evaluate
the performance of our method on three benchmarks: the
SHREC’17 Track dataset, Briareo dataset and the First Person
Hand Action dataset. The experiments show the efficiency of
our approach, which achieves or outperforms the state of the
art. The code to reproduce our results is available in this link.