PanKB: An interactive microbial pangenome knowledgebase for research, biotechnological innovation, and knowledge mining.

Journal: Nucleic acids research
PMID:

Abstract

The exponential growth of microbial genome data presents unprecedented opportunities for unlocking the potential of microorganisms. The burgeoning field of pangenomics offers a framework for extracting insights from this big biological data. Recent advances in microbial pangenomic research have generated substantial data and literature, yielding valuable knowledge across diverse microbial species. PanKB (pankb.org), a knowledgebase designed for microbial pangenomics research and biotechnological applications, was built to capitalize on this wealth of information. PanKB currently includes 51 pangenomes from 8 industrially relevant microbial families, comprising 8402 genomes, over 500 000 genes and over 7M mutations. To describe this data, PanKB implements four main components: (1) Interactive pangenomic analytics to facilitate exploration, intuition, and potential discoveries; (2) Alleleomic analytics, a pangenomic-scale analysis of variants, providing insights into intra-species sequence variation and potential mutations for applications; (3) A global search function enabling broad and deep investigations across pangenomes to power research and bioengineering workflows; (4) A bibliome of 833 open-access pangenomic papers and an interface with an LLM that can answer in-depth questions using its knowledge. PanKB empowers researchers and bioengineers to harness the potential of microbial pangenomics and serves as a valuable resource bridging the gap between pangenomic data and practical applications.

Authors

  • Binhuan Sun
    Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Building 220 Søltofts Plads, 2800 Kongens, Lyngby, Denmark.
  • Liubov Pashkova
    Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Building 220 Søltofts Plads, 2800 Kongens, Lyngby, Denmark.
  • Pascal Aldo Pieters
    Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Building 220 Søltofts Plads, 2800 Kongens, Lyngby, Denmark.
  • Archana Sanjay Harke
    Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Building 220 Søltofts Plads, 2800 Kongens, Lyngby, Denmark.
  • Omkar Satyavan Mohite
    Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Building 220 Søltofts Plads, 2800 Kongens, Lyngby, Denmark.
  • Alberto Santos
    Disease Systems Biology, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, 2200, Denmark.
  • Daniel C Zielinski
    Department of Bioengineering, University of California, San Diego, La Jolla, CA, 92093-0412, USA.
  • Bernhard O Palsson
    Department of Bioengineering, University of California, San Diego, CA, USA.
  • Patrick Victor Phaneuf
    Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Building 220 Søltofts Plads, 2800 Kongens, Lyngby, Denmark.