GIT-Mol: A multi-modal large language model for molecular science with graph, image, and text.

Journal: Computers in biology and medicine

Published Date: Jan 30, 2024

Abstract

Large language models have made significant strides in natural language processing, enabling innovative applications in molecular science by processing textual representations of molecules. However, most existing language models cannot capture the rich information with complex molecular structures or images. In this paper, we introduce GIT-Mol, a multi-modal large language model that integrates the Graph, Image, and Text information. To facilitate the integration of multi-modal molecular data, we propose GIT-Former, a novel architecture that is capable of aligning all modalities into a unified latent space. We achieve a 5%-10% accuracy increase in properties prediction and a 20.2% boost in molecule generation validity compared to the baselines. With the any-to-language molecular translation strategy, our model has the potential to perform more downstream tasks, such as compound name recognition and chemical reaction prediction.

Authors

Pengfei Liu

Department of Anesthesiology, Beijing Shijitan Hospital, Capital Medical University, Beijing, China.
Yiming Ren

Peng Cheng Laboratory, Shenzhen, 518055, Guangdong Province, China.
Jun Tao

Department of Urology, the First Affiliated Hospital with Nanjing Medical University, Nanjing, China.
Zhixiang Ren

Peng Cheng Laboratory, Shenzhen, 518055, Guangdong Province, China. Electronic address: renzhx@pcl.ac.cn.

Keywords

Language Natural Language Processing

External Resources

View on PubMed Access via DOI PubMed (38359660)

GIT-Mol: A multi-modal large language model for molecular science with graph, image, and text.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals