COMIX: Compositional Explanations using Prototypes
Journal:
arXiv
Published Date:
Jan 10, 2025
Abstract
Aligning machine representations with human understanding is key to improving
interpretability of machine learning (ML) models. When classifying a new image,
humans often explain their decisions by decomposing the image into concepts and
pointing to corresponding regions in familiar images. Current ML explanation
techniques typically either trace decision-making processes to reference
prototypes, generate attribution maps highlighting feature importance, or
incorporate intermediate bottlenecks designed to align with human-interpretable
concepts. The proposed method, named COMIX, classifies an image by decomposing
it into regions based on learned concepts and tracing each region to
corresponding ones in images from the training dataset, assuring that
explanations fully represent the actual decision-making process. We dissect the
test image into selected internal representations of a neural network to derive
prototypical parts (primitives) and match them with the corresponding
primitives derived from the training data. In a series of qualitative and
quantitative experiments, we theoretically prove and demonstrate that our
method, in contrast to post hoc analysis, provides fidelity of explanations and
shows that the efficiency is competitive with other inherently interpretable
architectures. Notably, it shows substantial improvements in fidelity and
sparsity metrics, including 48.82% improvement in the C-insertion score on the
ImageNet dataset over the best state-of-the-art baseline.