Diagnostic Accuracy and Clinical Value of a Domain-specific Multimodal Generative AI Model for Chest Radiograph Report Generation.

Journal: Radiology

PMID: 40131111

Abstract

Background Generative artificial intelligence (AI) is anticipated to alter radiology workflows, requiring a clinical value assessment for frequent examinations like chest radiograph interpretation. Purpose To develop and evaluate the diagnostic accuracy and clinical value of a domain-specific multimodal generative AI model for providing preliminary interpretations of chest radiographs. Materials and Methods For training, consecutive radiograph-report pairs from frontal chest radiography were retrospectively collected from 42 hospitals (2005-2023). The trained domain-specific AI model generated radiology reports for the radiographs. The test set included public datasets (PadChest, Open-i, VinDr-CXR, and MIMIC-CXR-JPG) and radiographs excluded from training. The sensitivity and specificity of the model-generated reports for 13 radiographic findings, compared with radiologist annotations (reference standard), were calculated (with 95% CIs). Four radiologists evaluated the subjective quality of the reports in terms of acceptability, agreement score, quality score, and comparative ranking of reports from the domain-specific AI model, radiologists, and a general-purpose large language model (GPT-4Vision). Acceptability was defined as whether the radiologist would endorse the report as their own without changes. Agreement scores from 1 (clinically significant discrepancy) to 5 (complete agreement) were assigned using RADPEER; quality scores were on a 5-point Likert scale from 1 (very poor) to 5 (excellent). Results A total of 8 838 719 radiograph-report pairs (training) and 2145 radiographs (testing) were included (anonymized with respect to sex and gender). Reports generated by the domain-specific AI model demonstrated high sensitivity for detecting two critical radiographic findings: 95.3% (181 of 190) for pneumothorax and 92.6% (138 of 149) for subcutaneous emphysema. Acceptance rate, evaluated by four radiologists, was 70.5% (6047 of 8680), 73.3% (6288 of 8580), and 29.6% (2536 of 8580) for model-generated, radiologist, and GPT-4Vision reports, respectively. Agreement scores were highest for the model-generated reports (median = 4 [IQR, 3-5]) and lowest for GPT-4Vision reports (median = 1 [IQR, 1-3]; < .001). Quality scores were also highest for the model-generated reports (median = 4 [IQR, 3-5]) and lowest for the GPT-4Vision reports (median = 2 [IQR, 1-3]; < .001). From the ranking analysis, model-generated reports were most frequently ranked the highest (60.0%; 5146 of 8580), and GPT-4Vision reports were most frequently ranked the lowest (73.6%; 6312 of 8580). Conclusion A domain-specific multimodal generative AI model demonstrated potential for high diagnostic accuracy and clinical value in providing preliminary interpretations of chest radiographs for radiologists. © RSNA, 2025 See also the editorial by Little in this issue.

Authors

Eun Kyoung Hong

Department of Radiology, Brigham & Women's Hospital, 75 Francis St, Boston, MA 02215.
Jiyeon Ham

Kakao, Seoul, South Korea.
Byungseok Roh

Kakao, Seoul, South Korea.
Jawook Gu

Soombit.ai, Seoul, South Korea.
Beomhee Park

Kakao, Seoul, South Korea.
Sunghun Kang

Kakao, Seoul, South Korea.
Kihyun You

Soombit.ai, Seoul, South Korea.
Jihwan Eom

Kakao, Seoul, South Korea.
Byeonguk Bae

Kakao, Seoul, South Korea.
Jae-Bock Jo

Soombit.ai, Seoul, South Korea.
Ok Kyu Song

Kakao, Seoul, South Korea.
Woong Bae

Soombit.ai, Seoul, South Korea.
Ro Woon Lee

Inha University, Incheon, South Korea.
Chong Hyun Suh

Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea.
Chan Ho Park

College of Medicine, Soonchunhyang University, Cheonan, South Korea.
Seong Jun Choi

College of Medicine, Soonchunhyang University, Cheonan, South Korea.
Jai Soung Park

College of Medicine, Soonchunhyang University, Cheonan, South Korea.
Jae-Hyeong Park

College of Medicine, Chungnam National University, Daejun, South Korea.
Hyun Jeong Jeon

College of Medicine, Chungbuk National University, Cheongju, South Korea.
Jeong-Ho Hong

School of Medicine, Keimyung University, Daegu, South Korea.
Dosang Cho

College of Medicine, Ewha Womans University, Seoul, South Korea.
Han Seok Choi

Division of Endocrinology and Metabolism, Department of Internal Medicine, Dongguk University Ilsan Hospital, Dongguk University College of Medicine, Goyang, Korea.
Tae Hee Kim

Keywords

Adult Aged Artificial Intelligence Female Humans Male Middle Aged Radiographic Image Interpretation, Computer-Assisted Radiography, Thoracic Reproducibility of Results Retrospective Studies Sensitivity and Specificity

External Resources

View on PubMed Access via DOI PubMed (40131111)

Diagnostic Accuracy and Clinical Value of a Domain-specific Multimodal Generative AI Model for Chest Radiograph Report Generation.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals