Interpretable AI for Time-Series: Multi-Model Heatmap Fusion with Global Attention and NLP-Generated Explanations
Journal:
arXiv
Published Date:
Jun 30, 2025
Abstract
In this paper, we present a novel framework for enhancing model
interpretability by integrating heatmaps produced separately by ResNet and a
restructured 2D Transformer with globally weighted input saliency. We address
the critical problem of spatial-temporal misalignment in existing
interpretability methods, where convolutional networks fail to capture global
context and Transformers lack localized precision - a limitation that impedes
actionable insights in safety-critical domains like healthcare and industrial
monitoring. Our method merges gradient-weighted activation maps (ResNet) and
Transformer attention rollout into a unified visualization, achieving full
spatial-temporal alignment while preserving real-time performance. Empirical
evaluations on clinical (ECG arrhythmia detection) and industrial (energy
consumption prediction) datasets demonstrate significant improvements: the
hybrid framework achieves 94.1% accuracy (F1 0.93) on the PhysioNet dataset and
reduces regression error to RMSE = 0.28 kWh (R2 = 0.95) on the UCI Energy
Appliance dataset-outperforming standalone ResNet, Transformer, and
InceptionTime baselines by 3.8-12.4%. An NLP module translates fused heatmaps
into domain-specific narratives (e.g., "Elevated ST-segment between 2-4 seconds
suggests myocardial ischemia"), validated via BLEU-4 (0.586) and ROUGE-L
(0.650) scores. By formalizing interpretability as causal fidelity and
spatial-temporal alignment, our approach bridges the gap between technical
outputs and stakeholder understanding, offering a scalable solution for
transparent, time-aware decision-making.