Counterfactual Explanation Through Latent Adjustment in Disentangled Space of Diffusion Model.
Journal:
IEEE transactions on neural networks and learning systems
Published Date:
Jun 24, 2025
Abstract
With the rise of explainable artificial intelligence (XAI), counterfactual (CF) explanations have gained significant attention. Effective CFs must be valid (classified as the CF class), practical (minimally deviated from the input), and plausible (close to the CF data manifold). However, practicality and plausibility often conflict, making valid CF generation challenging. To address this, we propose a novel framework that generates CFs by adjusting only semantic information in the disentangled latent space of a diffusion model. This shifts the sample closer to the CF manifold and across the decision boundary. In our framework, the latent vector mapping step occasionally produces invalid CFs or CFs insufficiently close to the decision boundary, resulting in dissimilarity to the input. Our method overcomes this with a two-stage latent vector adjustment: 1) linear interpolation and 2) time-step-wise optimization during reverse diffusion within the space accommodating linear changes in class information from the input. Experiments demonstrate that our approach generates more valid, plausible, and practical CFs by effectively leveraging the properties of the disentangled latent space.
Authors
Keywords
No keywords available for this article.