Predicting molecular types of adult-type diffuse gliomas based on MRI reports with large language models.
Journal:
European radiology
Published Date:
Dec 22, 2025
Abstract
OBJECTIVES: To evaluate the performance of large language models (LLMs) in predicting molecular types of adult-type diffuse gliomas according to the 2021 WHO classification using MRI radiology reports. MATERIALS AND METHODS: This retrospective study included 2169 patients diagnosed with adult-type diffuse gliomas (294 oligodendrogliomas, 295 IDH-mutant astrocytomas, and 1580 IDH-wildtype glioblastomas) between July 2005 and March 2024 from four hospitals in Asia and Europe. Seven proprietary and open-source LLMs were assessed: GPT-4o-mini, GPT-4.1-mini, Llama 3.1 8B, Llama 3.1 70B, Qwen2.5 7B, Deepseek-r1 8B, and Mistal 7B. The performance of LLMs in classifying molecular types was compared based on the provision of relevant knowledge of glioma imaging findings (knowledge-based vs. naïve prompt). The impact of radiologists' subspecialization in neuro-oncology, report quality, and reporting language on LLMs' performance was also evaluated. RESULTS: LLMs achieved significantly higher (naïve vs. knowledge-based; GPT-4o-mini, 77.0% vs. 79.1%, p < 0.001; Qwen2.5 7B, 75.9% vs. 79.5%, p < 0.001; Deepseek-r1 8B, 66.0% vs. 73.2%, p < 0.001) or comparable accuracy (GPT-4.1-mini, 78.7% vs. 78.6%; Llama 3.1 70B, 78.0% vs. 78.1%; Mistral 7B, 58.4% vs. 57.4%) using knowledge-based prompt compared to naïve prompt, except for Llama 3.1 8B (65.4% vs. 44.6%, p < 0.001). Differences in accuracy were more pronounced in smaller-sized LLMs. Additionally, the accuracy was significantly higher with reports by neuro-oncology specialists and high-quality reports in all LLMs (p < 0.001). CONCLUSIONS: LLMs may provide preoperative information on the tumor types of adult-type diffuse gliomas from MRI reports by providing relevant knowledge in the prompt. Informative and descriptive reports could further enhance LLMs' performance. KEY POINTS: Question Our study aimed to evaluate large language models' (LLMs) ability to efficiently predict molecular types of adult-type diffuse gliomas according to the 2021 WHO classification. Findings Larger models generally showed better accuracy and were less sensitive to domain-specific knowledge. Their performance improved when using high-quality, longer reports or reports by neuro-oncology specialists. Clinical relevance These findings highlight the potential role of LLMs in predicting glioma molecular types, underscoring the importance of informative and descriptive reports in enhancing their performance.
Authors
Keywords
No keywords available for this article.