A clinically accessible small multimodal radiology model and evaluation metric for chest X-ray findings.

Journal: Nature communications

PMID: 40169573

Abstract

Large foundation models show promise in biomedicine but face challenges in clinical use due to performance gaps, accessibility, cost, and lack of scalable evaluation. Here we show that open-source small multimodal models can bridge these gaps in radiology by generating free-text findings from chest X-ray images. Our data-centric approach leverages 697K curated radiology image-text pairs to train a specialized, domain-adapted chest X-ray encoder. We integrate this encoder with pre-trained language models via a lightweight adapter that aligns image and text modalities. To enable robust, clinically relevant evaluation, we develop and validate CheXprompt, a GPT-4-based metric for assessing factual accuracy aligned with radiologists' evaluations. Benchmarked with CheXprompt and other standard factuality metrics, LLaVA-Rad (7B) achieves state-of-the-art performance, outperforming much larger models like GPT-4V and Med-PaLM M (84B). While not immediately ready for real-time clinical deployment, LLaVA-Rad is a scalable, privacy-preserving and cost-effective step towards clinically adaptable multimodal AI for radiology.

Authors

Juan Manuel Zambrano Chaves

Microsoft Research, Redmond, WA, USA.
Shih-Cheng Huang

Stanford University, Stanford, CA, USA.
Yanbo Xu

Microsoft Research, Redmond, WA, USA.
Hanwen Xu

University of Washington, Seattle, WA, USA.
Naoto Usuyama

Microsoft Research, Redmond, WA, USA.
Sheng Zhang

Department of Critical Care Medicine, Taizhou Hospital of Zhejiang Province, Wenzhou Medical University, Taizhou, China.
Fei Wang

Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY, United States.
Yujia Xie

Microsoft Research, Redmond, WA, USA.
Mahmoud Khademi

Microsoft Research, Redmond, WA, USA.
Ziyi Yang

Microsoft Research, Redmond, WA, USA.
Hany Awadalla

Microsoft Research, Redmond, WA, USA.
Julia Gong

Microsoft Research, Redmond, WA, USA.
Houdong Hu

Microsoft Research, Redmond, WA, USA.
Jianwei Yang

Microsoft Research, Redmond, WA, USA.
Chunyuan Li

Microsoft Research, Redmond, WA, USA.
Jianfeng Gao

Microsoft Research, Redmond, WA, USA.
Yu Gu

Microsoft Research, Redmond, WA, USA.
Cliff Wong

Microsoft Research, Redmond, WA, USA.
Mu Wei

Microsoft Research, Redmond, WA, USA.
Tristan Naumann

Microsoft Research, Redmond, WA, USA.
Muhao Chen

University of California, Davis, CA, USA.
Matthew P Lungren
Akshay Chaudhari

Stanford University, Stanford, CA, USA.
Serena Yeung-Levy

Stanford University, Stanford, CA, USA.
Curtis P Langlotz

Stanford University, University Medical Line, Stanford, CA, 94305, US.
Sheng Wang

Intensive Care Medical Center, Tongji Hospital, School of Medicine, Tongji University, Shanghai, 200065, People's Republic of China.
Hoifung Poon

Microsoft Research, Redmond, WA, USA. hoifung@microsoft.com.

Keywords

Humans Radiography, Thoracic Radiology

External Resources

View on PubMed Access via DOI PubMed (40169573)

A clinically accessible small multimodal radiology model and evaluation metric for chest X-ray findings.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals