A scoping review on multimodal deep learning in biomedical images and texts.

Journal: Journal of biomedical informatics
Published Date:

Abstract

OBJECTIVE: Computer-assisted diagnostic and prognostic systems of the future should be capable of simultaneously processing multimodal data. Multimodal deep learning (MDL), which involves the integration of multiple sources of data, such as images and text, has the potential to revolutionize the analysis and interpretation of biomedical data. However, it only caught researchers' attention recently. To this end, there is a critical need to conduct a systematic review on this topic, identify the limitations of current work, and explore future directions.

Authors

  • Zhaoyi Sun
    Department of Population Health Sciences, Weill Cornell Medicine, New York, NY.
  • Mingquan Lin
    Department of Population Health Sciences, Weill Cornell Medicine, New York, USA.
  • Qingqing Zhu
    National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894, USA. Electronic address: qingqing.zhu@nih.gov.
  • Qianqian Xie
    Department of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT 06510, United States.
  • Fei Wang
    Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY, United States.
  • Zhiyong Lu
    National Center for Biotechnology Information, Bethesda, MD 20894 USA.
  • Yifan Peng
    Department of Population Health Sciences, Weill Cornell Medicine, New York, USA.