Accurate prediction of mortality in children with sepsis: development and validation of an explainable model based on real-world data.
Journal:
Italian journal of pediatrics
Published Date:
Jun 3, 2026
Abstract
BACKGROUND: Sepsis remains the leading cause of in-hospital deaths among children, and there is currently a lack of precise early prediction models. This study aimed to develop an interpretable machine learning (IML) model to predict in-hospital mortality in pediatric patients diagnosed with sepsis according to the 2024 Phoenix Sepsis Diagnostic Criteria. METHODS: This single-center retrospective cohort study analyzed 464,459 children who were hospitalized in the Children's Hospital Affiliated to Nanjing Medical University from January 2018 to December 2023 were screened. Based on the Phoenix Sepsis Score, 2,477 cases of childhood sepsis were finally diagnosed. The features were filtered using three methods the minimum absolute contraction and selection operator (LASSO) regression, the Boruta algorithm, and random forest importance ranking, ultimately identifying 12 features. Eight machine learning algorithms were trained on 70% of the data and evaluated on a 30% test set. The optimal model was screened based on the accuracy of the test set and the area under the receiver operating characteristic curve (AUC). The interpretability of the model is enhanced by leveraging SHapley Additive exPlanations (SHAP) summary graphs, individual SHAP force maps, and partial dependency plots (PDPs). RESULTS: A total of 2,477 patients with sepsis met the participation criteria, with a median age of 26 months (IQR, 5-78 months). Among the 2,477 participants, 1,448 (58.5%) were boys. The CatBoost model demonstrated the best performance among the 8 constructed models, achieving an AUC of 0.889 and an accuracy of 92.9% in the test set. Notably, at the default decision threshold, the model demonstrated high specificity but relatively low sensitivity. However, by optimizing the decision threshold using the Youden Index, the model's sensitivity was substantially improved to 83.05%, effectively mitigating the potential risk of missed diagnoses for high-risk patients in clinical practice. Feature importance analysis indicated that invasive mechanical ventilation, Glasgow Coma Scale score, platelet count, uric acid, and SpO2/FiO2 ratio were the top five features that had the greatest impact on the CatBoost model. SHAP analysis provided both global feature importance and individualised risk explanations, and the model has been deployed as a freely accessible web calculator. CONCLUSIONS: The IML model developed in this study provides a highly interpretable and accurate tool for the early prediction of in-hospital mortality in pediatric sepsis patients. The SHAP approach improves model interpretability and helps clinicians understand the factors driving those predictions. However, as a single-center retrospective study, the generalizability of our findings to other clinical settings may be limited. Future multi-center prospective studies are necessary to further validate the model and address its current limitations in sensitivity to minimize the risk of missing high-risk cases.
Authors
Keywords
No keywords available for this article.