How well do multimodal LLMs interpret CT scans? An auto-evaluation framework for analyses.

Journal: Journal of biomedical informatics

Published Date: Jun 25, 2025

Abstract

OBJECTIVE: This study introduces a novel evaluation framework, GPTRadScore, to systematically assess the performance of multimodal large language models (MLLMs) in generating clinically accurate findings from CT imaging. Specifically, GPTRadScore leverages LLMs as an evaluation metric, aiming to provide a more accurate and clinically informed assessment than traditional language-specific methods. Using this framework, we evaluate the capability of several MLLMs, including GPT-4 with Vision (GPT-4V), Gemini Pro Vision, LLaVA-Med, and RadFM, to interpret findings in CT scans.

Authors

Qingqing Zhu

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894, USA. Electronic address: qingqing.zhu@nih.gov.
Benjamin Hou

Biomedical Image Analysis Group (BioMedIA), Imperial College London, London, UK.
Tejas Sudarshan Mathai

Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, Clinical Center, National Institutes of Health, 10 Center Drive, Bethesda, 20892, MD, USA.
Pritam Mukherjee

Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, Department of Radiology and Imaging Sciences, National Institutes of Health Clinical Center, Bethesda, MD, USA. pritam.mukherjee@nih.gov.
Qiao Jin

National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
Xiuying Chen

Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia.
Zhizheng Wang

National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
Ruida Cheng

Scientific Application Services (SAS), Office of Scientific Computing Services (OSCS), Office of Intramural Research, Center of Information Technology, National Institutes of Health, Bethesda, Maryland.
Ronald M Summers

National Institutes of Health, Clinical Center, Radiology and Imaging Sciences, 10 Center Drive, Bethesda, MD 20892, USA.
Zhiyong Lu

National Center for Biotechnology Information, Bethesda, MD 20894 USA.

Keywords

Algorithms Humans Natural Language Processing Retrospective Studies Tomography, X-Ray Computed

External Resources

View on PubMed Access via DOI PubMed (40578543)

How well do multimodal LLMs interpret CT scans? An auto-evaluation framework for analyses.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

How well do multimodal LLMs interpret CT scans? An auto-evaluation framework for analyses.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals