Frozen Large-Scale Pretrained Vision-Language Models are the Effective Foundational Backbone for Multimodal Breast Cancer Prediction.

Journal: IEEE journal of biomedical and health informatics
Published Date:

Abstract

Breast cancer is a pervasive global health concern among women. Leveraging multimodal data from enterprise patient databases-including Picture Archiving and Communication Systems (PACS) and Electronic Health Records (EHRs)-holds promise for improving prediction. This study introduces a multimodal deep-learning model leveraging mammogram datasets to evaluate breast cancer prediction. Our approach integrates frozen large-scale pretrained vision-language models, showcasing superior performance and stability compared to traditional image-tabular models across two public breast cancer datasets. The model consistently outperforms conventional full fine-tuning methods by using frozen pretrained vision-language models alongside a lightweight trainable classifier. The observed improvements are significant. In the CBIS-DDSM dataset, the Area Under the Curve (AUC) increases from 0.867 to 0.902 during validation and from 0.803 to 0.830 for the official test set. Within the EMBED dataset, AUC improves from 0.780 to 0.805 during validation. In scenarios with limited data, using Breast Imaging-Reporting and Data System category three (BI-RADS 3) cases, AUC improves from 0.91 to 0.96 on the official CBIS-DDSM test set and from 0.79 to 0.83 on a challenging validation set. This study underscores the benefits of vision-language models in jointly training diverse image-clinical datasets from multiple healthcare institutions, effectively addressing challenges related to non-aligned tabular features. Combining training data enhances breast cancer prediction on the EMBED dataset, outperforming all other experiments. In summary, our research emphasizes the efficacy of frozen large-scale pretrained vision-language models in multimodal breast cancer prediction, offering superior performance and stability over conventional methods, reinforcing their potential for breast cancer prediction.

Authors

  • Hung Q Vo
  • Lin Wang
    Department of Engineering Mechanics, Tsinghua University, Beijing 100084, China.
  • Kelvin K Wong
    Systems Medicine and Bioengineering, Houston Methodist Cancer Center, Houston Methodist Hospital and Department of Radiology, Weill Cornell Medicine, 6670 Bertner Ave, Houston, TX 77030, USA; The Ting Tsung and Wei Fong Chao Center for BRAIN, Houston Methodist Hospital, 6670 Bertner Ave, Houston, TX 77030, USA; Department of Radiology, Houston Methodist Institute for Academic Medicine, 6670 Bertner Ave, Houston, TX 77030, USA. Electronic address: kwong@houstonmethodist.org.
  • Chika F Ezeana
    Houston Methodist, Houston, TX.
  • Xiaohui Yu
    Beijing Key Laboratory for Green Catalysis and Separation, Key Laboratory of Beijing on Regional Air Pollution Control, Key Laboratory of Advanced Functional Materials, Education Ministry of China, Laboratory of Catalysis Chemistry and Nanoscience, Department of Environmental Chemical Engineering, School of Environmental and Chemical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, China.
  • Wei Yang
    Key Laboratory of Structure-Based Drug Design and Discovery (Shenyang Pharmaceutical University), Ministry of Education, School of Traditional Chinese Materia Medica, Shenyang Pharmaceutical University, Wenhua Road 103, Shenyang 110016, PR China. Electronic address: 421063202@qq.com.
  • Jenny Chang
    Houston Methodist, Houston, TX.
  • Hien V Nguyen
  • Stephen T C Wong
    Translational Biophotonics Laboratory, Department of Systems Medicine and Bioengineering, Houston Me, United States.