DeepSeek-R1 performance in fusion planning of adolescent idiopathic scoliosis: A preliminary study.

Journal: Orthopaedics & traumatology, surgery & research : OTSR
Published Date:

Abstract

BACKGROUND: Surgical planning for adolescent idiopathic scoliosis (AIS) is complex. Large language models (LLMs) like DeepSeek Reasoning Model R1 (DeepSeek-R1) offer potential for decision support, but their accuracy in determining fusion levels is unproven. HYPOTHESIS: The DeepSeek-R1 model could generate surgically reasonable fusion levels for AIS with clinically acceptable accuracy. PATIENTS AND METHODS: This study enrolled 203 consecutive AIS patients meeting surgical indications. Comprehensive clinical and radiological data, including Lenke classification, were structured into standardized prompts. DeepSeek-R1 was tasked with determining the upper and lower instrumented vertebrae (UIV, LIV) for each case. Its outputs were evaluated independently by three experienced spinal surgeons using a 5-point Likert scale (≥3 defined as reasonable). Inter-rater reliability was assessed using Intraclass Correlation Coefficient (ICC). Performance across Lenke subtypes was analyzed using Fisher's exact test with Monte Carlo simulation. RESULTS: DeepSeek-R1 generated surgically reasonable fusion levels (Likert score ≥3) in 70.9% (144/203) of cases, with excellent inter-expert agreement (ICC = 0.840, 95% CI [0.798, 0.875]). Performance varied significantly by Lenke subtype, demonstrating high reasonable rates for types 1A (87.0%), 1B (82.9%), 5C (81.1%), and 6C (87.5%). However, suboptimal performance was observed for types 1C (19.0%), 2A (42.9%), and 2C (25.0%). CONCLUSION: DeepSeek-R1 demonstrated clinically acceptable accuracy in planning AIS fusion levels overall, particularly excelling in specific Lenke curve patterns (e.g., 1A, 1B, 5C, 6C). Its performance, however, was inconsistent across all subtypes, highlighting limitations in complex curve scenarios (notably 1C, 2A, 2C). While promising as a decision-support tool, further refinement and validation are necessary before clinical implementation. LEVEL OF EVIDENCE: III.

Authors

Keywords

No keywords available for this article.