A clinically accessible small multimodal radiology model and evaluation metric for chest X-ray findings.

Journal: Nature communications
PMID:

Abstract

Large foundation models show promise in biomedicine but face challenges in clinical use due to performance gaps, accessibility, cost, and lack of scalable evaluation. Here we show that open-source small multimodal models can bridge these gaps in radiology by generating free-text findings from chest X-ray images. Our data-centric approach leverages 697K curated radiology image-text pairs to train a specialized, domain-adapted chest X-ray encoder. We integrate this encoder with pre-trained language models via a lightweight adapter that aligns image and text modalities. To enable robust, clinically relevant evaluation, we develop and validate CheXprompt, a GPT-4-based metric for assessing factual accuracy aligned with radiologists' evaluations. Benchmarked with CheXprompt and other standard factuality metrics, LLaVA-Rad (7B) achieves state-of-the-art performance, outperforming much larger models like GPT-4V and Med-PaLM M (84B). While not immediately ready for real-time clinical deployment, LLaVA-Rad is a scalable, privacy-preserving and cost-effective step towards clinically adaptable multimodal AI for radiology.

Authors

  • Juan Manuel Zambrano Chaves
    Microsoft Research, Redmond, WA, USA.
  • Shih-Cheng Huang
    Stanford University, Stanford, CA, USA.
  • Yanbo Xu
    Microsoft Research, Redmond, WA, USA.
  • Hanwen Xu
    University of Washington, Seattle, WA, USA.
  • Naoto Usuyama
    Microsoft Research, Redmond, WA, USA.
  • Sheng Zhang
    Department of Critical Care Medicine, Taizhou Hospital of Zhejiang Province, Wenzhou Medical University, Taizhou, China.
  • Fei Wang
    Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY, United States.
  • Yujia Xie
    Microsoft Research, Redmond, WA, USA.
  • Mahmoud Khademi
    Microsoft Research, Redmond, WA, USA.
  • Ziyi Yang
    Microsoft Research, Redmond, WA, USA.
  • Hany Awadalla
    Microsoft Research, Redmond, WA, USA.
  • Julia Gong
    Microsoft Research, Redmond, WA, USA.
  • Houdong Hu
    Microsoft Research, Redmond, WA, USA.
  • Jianwei Yang
    Microsoft Research, Redmond, WA, USA.
  • Chunyuan Li
    Microsoft Research, Redmond, WA, USA.
  • Jianfeng Gao
    Microsoft Research, Redmond, WA, USA.
  • Yu Gu
    Microsoft Research, Redmond, WA, USA.
  • Cliff Wong
    Microsoft Research, Redmond, WA, USA.
  • Mu Wei
    Microsoft Research, Redmond, WA, USA.
  • Tristan Naumann
    Microsoft Research, Redmond, WA, USA.
  • Muhao Chen
    University of California, Davis, CA, USA.
  • Matthew P Lungren
  • Akshay Chaudhari
    Stanford University, Stanford, CA, USA.
  • Serena Yeung-Levy
    Stanford University, Stanford, CA, USA.
  • Curtis P Langlotz
    Stanford University, University Medical Line, Stanford, CA, 94305, US.
  • Sheng Wang
    Intensive Care Medical Center, Tongji Hospital, School of Medicine, Tongji University, Shanghai, 200065, People's Republic of China.
  • Hoifung Poon
    Microsoft Research, Redmond, WA, USA. hoifung@microsoft.com.