Evaluating dual-path temporal fusion strategies for multi-modal hand gesture recognition under limb-position variability.

Journal: Journal of neural engineering
Published Date:

Abstract

OBJECTIVE: Wearable biosignal-based hand gesture recognition (HGR) is a key enabling technology for prosthetic hand control, but its reliability is often affected by limb-position variability and other real-world confounding factors. This study aims to systematically compare dual-path temporal fusion architectures for co-located surface electromyography (sEMG) and pressure-based force myography (pFMG), with emphasis on robustness, computational efficiency, and interpretability under conditions relevant to practical prosthetic use. APPROACH: Three dual-path Temporal Convolutional Network (DFF-TCN) architectures were structured and investigated to integrate sEMG and pFMG using different fusion strategies: (1) a baseline concatenation-based model combining decision-level and feature-level fusion, (2) a decision-level cross-attention variant, and (3) a feature-level cross-attention variant. All models were evaluated under identical training and testing protocols using a custom dataset collected from ten participants performing nine functional hand gestures across multiple static and dynamic arm positions. Main results. Across all evaluated conditions, the concatenation-based DFF-TCN achieved balanced performance, with a mean classification accuracy of 95.88%, while the attention-based variants achieved accuracies of 90.65% and 94.02%, respectively. Computational profiling showed that the concatenation-based model also achieved the lowest inference latency (1.70 ms), indicating suitability for real-time deployment. Explainable artificial intelligence analysis using Integrated Gradients revealed complementary contributions from sEMG (54.08%) and pFMG (45.92%), with contribution patterns varying across gestures and subjects. SIGNIFICANCE: The results demonstrate that different fusion strategies offer distinct trade-offs between recognition performance, computational cost, and robustness. In particular, the concatenation-based model provides a favorable balance for real-time prosthetic hand control, while attention-based variants offer additional modeling flexibility. These findings provide practical guidance for selecting multi-modal fusion architectures in wearable HMI systems and support the continued use of co-located sEMG-pFMG sensing in prosthetic and rehabilitation applications.

Authors

Keywords

No keywords available for this article.