Toward Faithful Neural Network Intrinsic Interpretation With Shapley Additive Self-Attribution.

Journal: IEEE transactions on neural networks and learning systems

Published Date: May 29, 2025

Abstract

Self-interpreting neural networks have attracted significant attention from the research community. Along this line, extensive works inherently share the intuitive principle of linear contribution aggregation from diversified perspectives, while often: 1) lacking a solid theoretical foundation ensuring genuine interpretability and 2) compromising model expressiveness. In response, we propose a generic additive self-attribution (ASA) framework to encapsulate the characteristics of various works in this field and underscore the absence of the Shapley value attribution. To fill in this gap, we propose a novel Shapley additive self-attributing neural network (SASANet). SASANet models meaningful outputs for arbitrary-numbered observable features, naturally leading to an unapproximated value function for Shapely value. Designing an intermediate sequential schema based on marginal contributions (MCs) and internal distillation procedure, we theoretically prove that the intermediate self-attribution value converging to the output's Shapley values. Finally, we conduct extensive experiments on multiple public datasets. The experimental results clearly demonstrate SASANet, being highly interpretable, outperforms existing self-attributing models in performance and is comparable with commonly adopted closed-box models. In addition, compared with adopting post hoc interpretation methods, SASANet's self-attribution provides a more accurate and efficient interpretation for its own predictions. To the best of the authors' knowledge, this is the first self-interpreting neural network structure that achieves modelwise Shapley attribution. Our code is available at: https://anonymous.4open.science/r/SASANet-B343.

Authors

Ying Sun

CFAR and I2R, Agency for Science, Technology and Research, Singapore.
Hengshu Zhu

Baidu Inc., Beijing, China.
Hui Xiong

Rutgers, The State University of New Jersey, NJ, USA.

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (40440134)

Toward Faithful Neural Network Intrinsic Interpretation With Shapley Additive Self-Attribution.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Toward Faithful Neural Network Intrinsic Interpretation With Shapley Additive Self-Attribution.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals