A preregistered, open pipeline for early cerebral palsy risk assessment from Infant Videos.

Journal: GigaScience

Published Date: Jan 20, 2026

Abstract

Cerebral Palsy (CP), affecting approximately 1 in 500 children due to abnormal brain development, impacts movement control. Early risk assessment via the General Movements Assessment (GMA) at 3-4 months is highly predictive for CP but relies on trained clinicians. Machine-learning-based approaches for predicting GMA score from video have shown considerable promise, but typically rely on dataset-specific preprocessing, custom feature sets, and manually designed model pipelines, which make external benchmarking more difficult. This, combined with strict privacy constraints on sharing data, makes it challenging to train and evaluate models across datasets, which is important for assessing clinical utility. There is therefore a need to develop approaches that will work across different datasets to enable multi-site dataset aggregation and model training. To address this gap, we developed an end-to-end pipeline that uses off-the-shelf pose estimation, general-purpose feature extraction, and automated machine learning - none of which are tuned to a specific dataset. We applied this approach to a newly generated large dataset of 1053 infants (with approximately 10-12% positive class for adverse GMA outcome, drawn from a high-risk clinical cohort) within a preregistered study design. Model performance was evaluated on a strict "lock-box" test set, which remained untouched during any phase of model development or preprocessing optimization, and only used for evaluation once the final model and pipeline had been preregistered. The developed model achieved moderate predictive accuracy for clinician-assessed GMA scores (Area Under the Receiver Operating Characteristic Curve, ROC-AUC = 0.77; Area Under the Precision-Recall Curve, PR-AUC = 0.41). The moderate accuracy is noteworthy given the 10-12% positive class prevalence, and power-law scaling of ROC-AUC as a function of increasing dataset size. By releasing de-identified feature data and open-source code, and simplifying the training pipeline using AutoML, our work establishes essential groundwork for future robust, globally relevant CP screening tools suitable for low-resource settings.

Authors

Melanie Segado

Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA.
Laura A Prosser

Department of Physical Medicine and Rehabilitation, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Andrea F Duncan

Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Michelle J Johnson

Sofiya Lysenko is a research assistant working in Rehabilitation Robotics Lab, University of Pennsylvania, Philadelphia, PA 19104, USA.
Konrad P Kording

Departments of Bioengineering and Neuroscience,University of Pennsylvania,Philadelphia,PA [email protected].

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (41556563)

A preregistered, open pipeline for early cerebral palsy risk assessment from Infant Videos.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

A preregistered, open pipeline for early cerebral palsy risk assessment from Infant Videos.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals