Counterfactual Diffusion Models for Interpretable Explanations of Artificial Intelligence Models in Pathology

Journal: bioRxiv
Published Date:

Abstract

Deep learning can extract predictive and prognostic biomarkers from histopathology whole slide images. However, explainable artificial intelligence approaches widely used in digital pathology, such as attention heatmaps and class activation mapping, offer only limited interpretability regarding the features captured by classifiers. Here, we present MoPaDi (Morphing histoPathology Diffusion), a framework for generating counterfactual explanations for histopathology images that reveal which morphological or style features drive classifier predictions. MoPaDi combines diffusion autoencoders with task-specific multiple instance learning classifiers to manipulate images and flip predictions by modifying relevant features. We evaluated the framework on multiple datasets spanning colorectal, breast, liver, and lung cancers, including tissue type, cancer subtype, and biomarker (microsatellite instability) classification tasks. We assessed counterfactual explanations through quantitative analyses, pathologists’ evaluations, and independent foundation model-based classifiers. We found that MoPaDi was able to generate realistic counterfactual histopathology images, enabling pathologists to identify morphological features associated with the change in model predictions. Unlike conventional reviews of highly attended regions typical in digital pathology, MoPaDi explanations enabled pathologists to directly identify morphological features driving the classifier’s predictions from a limited number of top-contributing tiles. Consistent with the literature, our biomarker classifier associated high microsatellite instability with mucinous differentiation, glandular patterns, and lymphocytic infiltration. Furthermore, MoPaDi revealed that changes in classifier predictions were mainly driven by morphological alterations rather than staining differences. Overall, MoPaDi is a practical framework for counterfactual explanations in computational pathology that reveals model-specific drivers of classification and increases trust in deep learning models.

Authors

  • Laura Žigutytė; Tim Lenz; Tianyu Han; Katherine Jane Hewitt; Nic Gabriel Reitsam; Sebastian Foersch; Zunamys Itzell Carrero; Michaela Unger; Asier Rabasco Meneghetti; Alexander T. Pearson; Daniel Truhn; Jakob Nikolas Kather