Accelerating Natural Product Discovery with Linked MS-Genomics and Language/Transformer-Based Models
Journal:
bioRxiv
Published Date:
Jan 1, 2025
Abstract
An integrated multi-modal characterization of a microbial strain library streamlines the effort for natural product discovery. By integrating language- and transformer-based models to cross-validate mass spectrometry (MS)-genome datasets, microbial producers of diverse natural products are rapidly identified with high (75-100%) precision. Our findings demonstrate the transformative potential of linked MS-genome datasets at the strain-level to significantly accelerate discovery and enhance our understanding of microbes beyond currently known and curated knowledge.