MultiViT2: A Data-augmented Multimodal Neuroimaging Prediction Framework via Latent Diffusion Model
Journal:
arXiv
Published Date:
Jun 16, 2025
Abstract
Multimodal medical imaging integrates diverse data types, such as structural
and functional neuroimaging, to provide complementary insights that enhance
deep learning predictions and improve outcomes. This study focuses on a
neuroimaging prediction framework based on both structural and functional
neuroimaging data. We propose a next-generation prediction model,
\textbf{MultiViT2}, which combines a pretrained representative learning base
model with a vision transformer backbone for prediction output. Additionally,
we developed a data augmentation module based on the latent diffusion model
that enriches input data by generating augmented neuroimaging samples, thereby
enhancing predictive performance through reduced overfitting and improved
generalizability. We show that MultiViT2 significantly outperforms the
first-generation model in schizophrenia classification accuracy and
demonstrates strong scalability and portability.