A preregistered, open pipeline for early cerebral palsy risk assessment from Infant Videos.

Journal: GigaScience
Published Date:

Abstract

Cerebral Palsy (CP), affecting approximately 1 in 500 children due to abnormal brain development, impacts movement control. Early risk assessment via the General Movements Assessment (GMA) at 3-4 months is highly predictive for CP but relies on trained clinicians. Machine-learning-based approaches for predicting GMA score from video have shown considerable promise, but typically rely on dataset-specific preprocessing, custom feature sets, and manually designed model pipelines, which make external benchmarking more difficult. This, combined with strict privacy constraints on sharing data, makes it challenging to train and evaluate models across datasets, which is important for assessing clinical utility. There is therefore a need to develop approaches that will work across different datasets to enable multi-site dataset aggregation and model training. To address this gap, we developed an end-to-end pipeline that uses off-the-shelf pose estimation, general-purpose feature extraction, and automated machine learning - none of which are tuned to a specific dataset. We applied this approach to a newly generated large dataset of 1053 infants (with approximately 10-12% positive class for adverse GMA outcome, drawn from a high-risk clinical cohort) within a preregistered study design. Model performance was evaluated on a strict "lock-box" test set, which remained untouched during any phase of model development or preprocessing optimization, and only used for evaluation once the final model and pipeline had been preregistered. The developed model achieved moderate predictive accuracy for clinician-assessed GMA scores (Area Under the Receiver Operating Characteristic Curve, ROC-AUC = 0.77; Area Under the Precision-Recall Curve, PR-AUC = 0.41). The moderate accuracy is noteworthy given the 10-12% positive class prevalence, and power-law scaling of ROC-AUC as a function of increasing dataset size. By releasing de-identified feature data and open-source code, and simplifying the training pipeline using AutoML, our work establishes essential groundwork for future robust, globally relevant CP screening tools suitable for low-resource settings.

Authors

  • Melanie Segado
    Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA.
  • Laura A Prosser
    Department of Physical Medicine and Rehabilitation, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
  • Andrea F Duncan
    Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
  • Michelle J Johnson
    Sofiya Lysenko is a research assistant working in Rehabilitation Robotics Lab, University of Pennsylvania, Philadelphia, PA 19104, USA.
  • Konrad P Kording
    Departments of Bioengineering and Neuroscience,University of Pennsylvania,Philadelphia,PA [email protected].

Keywords

No keywords available for this article.