Development and Evaluation of an AI-Assisted, Privacy-Preserving Surgical Risk Calculator

Journal: medRxiv
Published Date:

Abstract

Large language models (LLMs) have shown capabilities in generating functional code, yet their utility in the development of clinical prediction tools has not been significantly explored. We evaluated GPT-4o’s capability to create a postoperative complication risk calculator similar to the existing National Surgical Quality Improvement Program (NSQIP) risk calculator. This included data preprocessing, predictive modeling, and development of a web application. Synthetic data of a similar structure to the NSQIP dataset was used when communicating with GPT-4o to maintain privacy. 512 lines of Python code were generated across 14 prompts, with one line requiring human editing. The resulting logistic regression models achieved similar Brier scores compared to the original NSQIP risk calculator and demonstrated strong discrimination (C-statistic > 0.75), while slightly underperforming previously reported predictive metrics for some outcomes. Development was completed in three hours. These findings suggest that LLMs can facilitate rapid development of clinical decision support tools, though output still requires human oversight and refinement.

Authors

  • Nathan Wolfrath; Gopika SenthilKumar; Adhitya Ramamurthi; Anai N. Kothari