MolLM: a unified language model for integrating biomedical text with 2D and 3D molecular representations.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: The current paradigm of deep learning models for the joint representation of molecules and text primarily relies on 1D or 2D molecular formats, neglecting significant 3D structural information that offers valuable physical insight. This narrow focus inhibits the models' versatility and adaptability across a wide range of modalities. Conversely, the limited research focusing on explicit 3D representation tends to overlook textual data within the biomedical domain.

Authors

  • Xiangru Tang
    Department of Computer Science, Yale University, New Haven, CT 06520, United States.
  • Andrew Tran
    Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.
  • Jeffrey Tan
    Department of Biomedical Informatics & Data Science, Yale University, New Haven, CT 06520, USA.
  • Mark B Gerstein
    Program in Computational Biology and Bioinformatics, Yale University, New Haven, 06520, CT, USA. mark.gerstein@yale.edu.