Performance of GPT-based large language models in hepatocellular carcinoma stratification: liver function assessment, BCLC staging, and treatment recommendations.

Journal: Scientific reports

Published Date: Jun 12, 2026

Abstract

Large language models (LLMs) like GPT have been proposed to support complex clinical decision-making. This study evaluated the performance of GPT-based LLM in analyzing clinical, radiological, and laboratory data from patients with hepatocellular carcinoma (HCC) to assess liver function, assign BCLC stage, and recommend treatment. Data from 106 HCC patients (82% male, median age 65 [22-86]) were compiled into anonymized integrated reports. Four GPT-versions (4, o1, o3, 5.4) were prompted-using both short and long instructions-to calculate MELD, ALBI, and Child-Pugh scores, assign BCLC stage, and generate treatment recommendations based on current guidelines. Outputs were compared to expert consensus and tumor board decisions. Errors were categorized by type and source. Time and cost analyses compared GPT to clinical staff. All GPT versions achieved high accuracy (> 85%) in liver function assessment, with MELD calculation being the most error-prone. BCLC staging accuracy ranged from 46.2% (version 4) to 84.0% (o3), with misclassification of radiological reports as the main error source. Reasoning-optimized models (o1, o3) performed best for treatment recommendations, achieving an overall accuracy (correct suggestions and acceptable alternatives) of up to 90.6%. In 9-14% of cases, GPT suggestions were retrospectively more guideline-concordant than tumor board decisions. GPT processing was significantly faster and reduced costs by approximately 300- to 1300-fold compared to clinical staff. GPT-based LLMs show potential as decision-support tools for liver function assessment, BCLC staging, and treatment guidance in HCC. Particularly with reasoning-optimized models and detailed prompting, LLMs may serve as valuable adjuncts in multidisciplinary HCC workflows. However, a non-negligible error rate requires expert oversight and further model refinement.

Performance of GPT-based large language models in hepatocellular carcinoma stratification: liver function assessment, BCLC staging, and treatment recommendations.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Performance of GPT-based large language models in hepatocellular carcinoma stratification: liver function assessment, BCLC staging, and treatment recommendations.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals