X-SiT: Inherently Interpretable Surface Vision Transformers for Dementia Diagnosis
Journal:
arXiv
Published Date:
Jun 25, 2025
Abstract
Interpretable models are crucial for supporting clinical decision-making,
driving advances in their development and application for medical images.
However, the nature of 3D volumetric data makes it inherently challenging to
visualize and interpret intricate and complex structures like the cerebral
cortex. Cortical surface renderings, on the other hand, provide a more
accessible and understandable 3D representation of brain anatomy, facilitating
visualization and interactive exploration. Motivated by this advantage and the
widespread use of surface data for studying neurological disorders, we present
the eXplainable Surface Vision Transformer (X-SiT). This is the first
inherently interpretable neural network that offers human-understandable
predictions based on interpretable cortical features. As part of X-SiT, we
introduce a prototypical surface patch decoder for classifying surface patch
embeddings, incorporating case-based reasoning with spatially corresponding
cortical prototypes. The results demonstrate state-of-the-art performance in
detecting Alzheimer's disease and frontotemporal dementia while additionally
providing informative prototypes that align with known disease patterns and
reveal classification errors.