CL-MFAP: A Contrastive Learning-Based Multimodal Foundation Model for Molecular Property Prediction and Antibiotic Screening
Journal:
arXiv
Published Date:
Feb 16, 2025
Abstract
Due to the rise in antimicrobial resistance, identifying novel compounds with
antibiotic potential is crucial for combatting this global health issue.
However, traditional drug development methods are costly and inefficient.
Recognizing the pressing need for more effective solutions, researchers have
turned to machine learning techniques to streamline the prediction and
development of novel antibiotic compounds. While foundation models have shown
promise in antibiotic discovery, current mainstream efforts still fall short of
fully leveraging the potential of multimodal molecular data. Recent studies
suggest that contrastive learning frameworks utilizing multimodal data exhibit
excellent performance in representation learning across various domains.
Building upon this, we introduce CL-MFAP, an unsupervised contrastive learning
(CL)-based multimodal foundation (MF) model specifically tailored for
discovering small molecules with potential antibiotic properties (AP) using
three types of molecular data. This model employs 1.6 million bioactive
molecules with drug-like properties from the ChEMBL dataset to jointly pretrain
three encoders: (1) a transformer-based encoder with rotary position embedding
for processing SMILES strings; (2) another transformer-based encoder,
incorporating a novel bi-level routing attention mechanism to handle molecular
graph representations; and (3) a Morgan fingerprint encoder using a multilayer
perceptron, to achieve the contrastive learning purpose. The CL-MFAP
outperforms baseline models in antibiotic property prediction by effectively
utilizing different molecular modalities and demonstrates superior
domain-specific performance when fine-tuned for antibiotic-related property
prediction tasks.