Explainability in action: A metric-driven assessment of local explanations for healthcare tabular models

Journal: medRxiv
Published Date:

Abstract

Explainable AI (XAI) is essential in clinical machine learning, yet quantitative evaluation of explanation quality is rarely reported in a reproducible, comparable way. We introduce a metric-driven framework for tabular healthcare XAI that consolidates six established family-specific quantitative evaluation metrics (fidelity, simplicity, consistency, robustness, precision, coverage) into explicit equations, pairs them with a pre-specified focal-model selection protocol, and provides open-source code plus a method–metric applicability map. All quantitative metrics are computed on local (per-instance) explanations; any global summaries (e.g., aggregated SHAP importances, EBM main-effect shapes, or TabNet aggregated importances) are reported descriptively only. Using the framework, we evaluate five widely used approaches, LIME, SHAP, Anchors, EBM, and TabNet, across four healthcare tabular datasets spanning post-hoc feature attribution (LIME, SHAP), post-hoc rule extraction (Anchors), and inherently interpretable models (EBM, TabNet). For tree ensembles, we additionally report Random Forest global importances (Gini/MDI and permutation) as descriptive cross-checks alongside EBM/SHAP/TabNet global profiles. Empirically, SHAP (TreeSHAP) attains exact score fidelity (1.0) and near-perfect decision fidelity for tree ensembles; LIME yields simpler but less robust, lower-fidelity explanations; TabNet most often produces the simplest explanations across thresholds, while EBM and TabNet offer the most robust explanations under small perturbations; Anchors returns high-precision, human-readable rules whose coverage decreases as precision thresholds tighten. LIME and SHAP show moderate-to-high agreement on salient features, and global profiles (reported descriptively) align with known risk factors. Why this matters: the framework enables apples-to-apples comparisons, reduces confounds, and turns narrative guidance into testable, quantitative practice, helping practitioners choose XAI methods by application priority (e.g., fidelity, robustness, rule precision/coverage). Although demonstrated in healthcare, it generalizes to high-stakes tabular ML. Source code: https://github.com/matifq/XAI_Tab_Health.

Authors

  • M. Atif Qureshi; Abdul Aziz Noor; Awais Manzoor; Muhammad Deedahwar Mazhar Qureshi; Arjumand Younus; Wael Rashwan