Longitudinal image-based prediction of surgical intervention in infants with hydronephrosis using deep learning: Is a single ultrasound enough?
Journal:
PLOS digital health
Published Date:
Aug 4, 2025
Abstract
The potential of deep learning to predict renal obstruction using kidney ultrasound images has been demonstrated. However, these image-based classifiers have incorporated information using only single-visit ultrasounds. Here, we developed machine learning (ML) models incorporating ultrasounds from multiple clinic visits for hydronephrosis to generate a hydronephrosis severity index score to discriminate patients into high versus low risk for needing pyeloplasty and compare these against models trained with single clinic visit data. We included patients followed for hydronephrosis from three institutions. The outcome of interest was low risk versus high risk of obstructive hydronephrosis requiring pyeloplasty. The model was trained on data from Toronto, ON and validated on an internal holdout set, and tested on an internal prospective set and two external institutions. We developed models trained with single ultrasound (single-visit) and multi-visit models using average prediction, convolutional pooling, long-short term memory and temporal shift models. We compared model performance by area under the receiver-operator-characteristic (AUROC) and area under the precision-recall-curve (AUPRC). A total of 794 patients were included (603 SickKids, 102 Stanford, and 89 CHOP) with a pyeloplasty rate of 12%, 5%, and 67%, respectively. There was no significant difference in developing single-visit US models using the first ultrasound vs. the latest ultrasound. Comparing single-visit vs. multi-visit models, all multi-visit models fail to produce AUROC or AUPRC significantly greater than single-visit models. We developed ML models for hydronephrosis that incorporate multi-visit inference across multiple institutions but did not demonstrate superiority over single-visit inference. These results imply that the single-visit models would be sufficient in aiding accurate risk stratification from single, early ultrasound images.
Authors
Keywords
No keywords available for this article.