Multimodal deep learning for enhanced breast cancer diagnosis on sonography.

Journal: Computers in biology and medicine
Published Date:

Abstract

This study introduces a novel multimodal deep learning model tailored for the differentiation of benign and malignant breast masses using dual-view breast ultrasound images (radial and anti-radial views) in conjunction with corresponding radiology reports. The proposed multimodal model architecture includes specialized image and text encoders for independent feature extraction, along with a transformation layer to align the multimodal features for the subsequent classification task. The model achieved an area of the curve of 85% and outperformed unimodal models with 6% and 8% in Youden index. Additionally, our multimodal model surpassed zero-shot predictions generated by prominent foundation models such as CLIP and MedCLIP. In direct comparison with classification results based on physician-assessed ratings, our model exhibited clear superiority, highlighting its practical significance in diagnostics. By integrating both image and text modalities, this study exemplifies the potential of multimodal deep learning in enhancing diagnostic performance, laying the foundation for developing robust and transparent AI-assisted solutions.

Authors

  • Ting-Ruen Wei
    Santa Clara University, 500 El Camino Real, Santa Clara, 95053, CA, USA.
  • Aileen Chang
    Santa Clara Valley Medical Center, 751 S. Bascom Ave, San Jose, 95128, CA, USA.
  • Young Kang
    Santa Clara Valley Medical Center, 751 S. Bascom Ave, San Jose, 95128, CA, USA.
  • Mahesh Patel
    Dr B R Ambedkar National Institute of Technology Jalandhar, Jalandhar-144008, Punjab, India.
  • Yi Fang
    Department of Neurosurgery, The Fuzhou General Hospital, Fuzhou, China.
  • Yuling Yan
    Ultrasound Department, Shenzhen Futian District Maternity & Child Healthcare Hospital, Shenzhen, Guangdong, 518016, China.