PMoET: Going Wider than Deeper using the Parallel Mixture-of-Experts Transformer for 3D Hand Gesture Recognition
Conférence : Communications avec actes dans un congrès international
Mistral AI (Artificial Intelligence) Startup has released MixtralTransformers based on a Mixture of Experts layer (MoE). MoE architectures
have gained prominence in both Large Language Modeling (LLM) and Computer
Vision due to their ability to scale efficiently by dynamically selecting an
ensemble of specialized sub-models (a group of experts) for different inputs
rather than using the entire model for every input to optimize the overall
performance with a constant computational cost. Following Mixtral
Transformers, GSHard, GLaM, and Switch Transformer, we propose the
“Parallel Mixture-of-Experts Transformer (PMoET)” that coupled with Spatial
Linear Layer (SLL) and use parallel Mixture of Experts (MoE) layers for going
wider instead of deeper in traditional Transformer. PMoET model could be
scaled along width using parallel MoE layers instead of a single Feed-Forward
Network (FFN) layer in traditional Transformer. Using MoE architecture for
graph learning of 3D dynamic skeleton-data for 3D hand gesture recognition can
provide significant benefits due to the model’s ability to handle complex and
varied 3D skeleton data efficiently. For this purpose, Decoupling spatial and
temporal graph learning using a PMoET model by integrating parallel MoE
layers into parallel configuration is an advanced approach for recognizing 3D
hand gestures. By focusing on both the spatial configuration of the hand and the
temporal dynamics of gestures, the PMoET architecture provides a robust and
efficient solution for complex gesture recognition tasks. Finally, to evaluate the
PMoET performance, conducting extensive experiments on SHREC’17 dataset
provides strong evidence of PMoET effectiveness. PMoET improves the overall
performance, the training stability and reduces the computational cost.
Experiments show the efficiency and superiority of the PMoET model variants,
particularly PMoET Model 4, which surpasses the state-of-the-art (SOTA) performance achieved by DG-STA method.