Diagnosis of Multiple Sclerosis Using Multimodal Deep Learning Integrating Lesion and Normal-Appearing White Matter: A Retrospective Study with International Multicentre External Validation
Journal:
medRxiv
Published Date:
Mar 10, 2026
Abstract
Background: Current diagnostic criteria for multiple sclerosis (MS) rely on white matter lesions (WMLs), which are not specific and often occur in other disorders. Microstructural abnormalities in normal-appearing white matter (NAWM) may provide complementary information beyond focal lesions. However, the diagnostic use of NAWM in MS remains limited because a reproducible, diagnostically specific NAWM signature has not been established, and NAWM abnormalities detection typically requires quantitative MRI methods beyond routine clinical MRI protocols. Methods: In this retrospective study, we proposed DeepMS, a deep learning model trained with both quantitative diffusion MRI (dMRI) and structural MRI (sMRI) to diagnose MS by integrating WML and NAWM features captured from routine MRI alone. Development utilized 8,450 scans from 7,703 patients (NYU Langone/ADNI). Evaluation included an internal test set (n=837) and two independent external cohorts: the Krakow cohort (Poland, n=293) and a public multi-site cohort curated from 15 datasets (n=1,756). We compared DeepMS against 2024 McDonald criteria biomarkers (Dissemination in Time [DIT], Dissemination in Space [DIS], Central Vein Sign [CVS], and Paramagnetic Rim Lesion [PRL]) in a multireader study (n=308). To validate the model's use of NAWM, we performed lesion-masking experiments (n=550), comparing performance after removal of focal lesions. Findings: DeepMS achieved robust AUCs in the internal (0.968 [95% CI 0.946-0.987]), Krakow (0.940 [0.898-0.974]), and public external (0.974 [0.966-0.982]) cohorts. In the multireader study, DeepMS outperformed established biomarkers: at matched sensitivity (92.9%), DeepMS achieved higher specificity than DIS (89.0% vs 78.5%; p=0.0061); at matched specificity (92.8%), DeepMS achieved higher sensitivity than CVS (88.2% vs 52.0%; p<0.0001). Furthermore, DeepMS retained diagnostic capability after WML masking (AUC 0.959 to 0.881) compared to the model trained with only sMRI (0.895 to 0.764). Interpretation: Our findings suggest it is feasible for deep learning models to leverage NAWM-related information directly from routine sMRI. Integrating these features could help MS diagnosis in patients with ambiguous white matter abnormalities. Funding: National Institute of Neurological Disorders and Stroke, the National Institute of Biomedical Imaging and Bioengineering, and the Irma T. Hirschl Trust.