MLASDO: a software tool to detect and explain clinical and omics inconsistencies applied to the Parkinson's Progression Markers Initiative cohort
Journal:
arXiv
Published Date:
Jul 4, 2025
Abstract
Inconsistencies between clinical and omics data may arise within medical
cohorts. The identification, annotation and explanation of anomalous
omics-based patients or individuals may become crucial to better reshape the
disease, e.g., by detecting early onsets signaled by the omics and undetectable
from observable symptoms. Here, we developed MLASDO (Machine Learning based
Anomalous Sample Detection on Omics), a new method and software tool to
identify, characterize and automatically describe anomalous samples based on
omics data. Its workflow is based on three steps: (1) classification of healthy
and cases individuals using a support vector machine algorithm; (2) detection
of anomalous samples within groups; (3) explanation of anomalous individuals
based on clinical data and expert knowledge. We showcase MLASDO using
transcriptomics data of 317 healthy controls (HC) and 465 Parkinson's disease
(PD) cases from the Parkinson's Progression Markers Initiative. In this cohort,
MLASDO detected 15 anomalous HC with a PD-like transcriptomic signature and
PD-like clinical features, including a lower proportion of CD4/CD8 naive
T-cells and CD4 memory T-cells compared to HC (P<3.5*10^-3). MLASDO also
identified 22 anomalous PD cases with a transcriptomic signature more similar
to that of HC and some clinical features more similar to HC, including a lower
proportion of mature neutrophils compared to PD cases (P<6*10^-3). In summary,
MLASDO is a powerful tool that can help the clinician to detect and explain
anomalous HC and cases of interest to be followed up. MLASDO is an open-source
R package available at: https://github.com/JoseAdrian3/MLASDO.