IMG-ABC: A Knowledge Base To Fuel Discovery of Biosynthetic Gene Clusters and Novel Secondary Metabolites.

Journal: mBio
PMID:

Abstract

UNLABELLED: In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of "big" genomic data for discovering small molecules. IMG-ABC relies on IMG's comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve as the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC's focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in Alphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules.

Authors

  • Michalis Hadjithomas
    Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, California, USA.
  • I-Min Amy Chen
    Biosciences Computing, Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA.
  • Ken Chu
    Biosciences Computing, Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA.
  • Anna Ratner
    Biosciences Computing, Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA.
  • Krishna Palaniappan
    Biosciences Computing, Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA.
  • Ernest Szeto
    Biosciences Computing, Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA.
  • Jinghua Huang
    Biosciences Computing, Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA.
  • T B K Reddy
    Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, California, USA.
  • Peter Cimermančič
    Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California, USA.
  • Michael A Fischbach
    Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California, USA.
  • Natalia N Ivanova
    Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, California, USA.
  • Victor M Markowitz
    Biosciences Computing, Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA.
  • Nikos C Kyrpides
    Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, California, USA apati@lbl.gov nckyrpides@lbl.gov.
  • Amrita Pati
    Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, California, USA apati@lbl.gov nckyrpides@lbl.gov.