Accelerating Natural Product Discovery with Linked MS-Genomics and Language/Transformer-Based Models

Journal: bioRxiv
Published Date:

Abstract

An integrated multi-modal characterization of a microbial strain library streamlines the effort for natural product discovery. By integrating language- and transformer-based models to cross-validate mass spectrometry (MS)-genome datasets, microbial producers of diverse natural products are rapidly identified with high (75-100%) precision. Our findings demonstrate the transformative potential of linked MS-genome datasets at the strain-level to significantly accelerate discovery and enhance our understanding of microbes beyond currently known and curated knowledge.

Authors

  • Dillon W. P. Tay; Winston Koh; Shi Jun Ang; Zicong Marvin Wong; Yi Wee Lim; Elena Heng; Naythan Z. X. Yeo; Krishnan Adaikkappan; Fong Tian Wong; Yee Hwee Lim