f-BGM enables fungi-specific genome mining in high accuracy and interpretability
Journal:
bioRxiv
Published Date:
Jan 1, 2025
Abstract
Emerging artificial intelligence (AI)-based genome mining methods have revolutionized the paradigm of bacterial secondary metabolite (SM) discovery. Fortunately, recent data accumulation of fungal biosynthetic gene clusters (BGC) fairly offers opportunities for systematic development and evaluation of fungi-specific pipelines. In this work, we proposed a deep learning framework termed as f-BGM specifically for fungal genome mining. By designing a novel self-attention-based architecture to augment inter-domain associations in local genomic contexts, f-BGM exhibits superior performance over existing AI-based methods in both in-distribution and out-of-distribution benchmark tests for BGC detection. Further analyses demonstrate that f-BGM is of decent interpretability on deciphering single-domain and -protein importance, as well as inter-domain partnership. By establishing additional binary classification models, f-BGM also achieves high-quality identification of core enzymes within given BGCs. Finally, case studies of f-BGM-driven genome mining in marine fungi uncovers biosynthetic potential underestimated by the rule-based method antiSMASH, as supported by experimental and computational validation.