Evaluating dual-path temporal fusion strategies for multi-modal hand gesture recognition under limb-position variability.
Journal:
Journal of neural engineering
Published Date:
Feb 16, 2026
Abstract
OBJECTIVE: Wearable biosignal-based hand gesture recognition (HGR) is a key enabling
technology for prosthetic hand control, but its reliability is often affected by limb-position
variability and other real-world confounding factors. This study aims to systematically
compare dual-path temporal fusion architectures for co-located surface electromyography
(sEMG) and pressure-based force myography (pFMG), with emphasis on robustness,
computational efficiency, and interpretability under conditions relevant to practical
prosthetic use. APPROACH: Three dual-path Temporal Convolutional Network (DFF-TCN)
architectures were structured and investigated to integrate sEMG and pFMG using
different fusion strategies: (1) a baseline concatenation-based model combining
decision-level and feature-level fusion, (2) a decision-level cross-attention variant, and (3) a
feature-level cross-attention variant. All models were evaluated under identical training
and testing protocols using a custom dataset collected from ten participants performing
nine functional hand gestures across multiple static and dynamic arm positions. Main
results. Across all evaluated conditions, the concatenation-based DFF-TCN achieved
balanced performance, with a mean classification accuracy of 95.88%, while the
attention-based variants achieved accuracies of 90.65% and 94.02%, respectively.
Computational profiling showed that the concatenation-based model also achieved the
lowest inference latency (1.70 ms), indicating suitability for real-time deployment.
Explainable artificial intelligence analysis using Integrated Gradients revealed
complementary contributions from sEMG (54.08%) and pFMG (45.92%), with
contribution patterns varying across gestures and subjects. SIGNIFICANCE: The results
demonstrate that different fusion strategies offer distinct trade-offs between recognition
performance, computational cost, and robustness. In particular, the concatenation-based
model provides a favorable balance for real-time prosthetic hand control, while
attention-based variants offer additional modeling flexibility. These findings provide
practical guidance for selecting multi-modal fusion architectures in wearable HMI systems
and support the continued use of co-located sEMG-pFMG sensing in prosthetic and
rehabilitation applications.
Authors
Keywords
No keywords available for this article.