PlasmidHunter: accurate and fast prediction of plasmid sequences using gene content profile and machine learning.

Journal: Briefings in bioinformatics
PMID:

Abstract

Plasmids are extrachromosomal DNA found in microorganisms. They often carry beneficial genes that help bacteria adapt to harsh conditions. Plasmids are also important tools in genetic engineering, gene therapy, and drug production. However, it can be difficult to identify plasmid sequences from chromosomal sequences in genomic and metagenomic data. Here, we have developed a new tool called PlasmidHunter, which uses machine learning to predict plasmid sequences based on gene content profile. PlasmidHunter can achieve high accuracies (up to 97.6%) and high speeds in benchmark tests including both simulated contigs and real metagenomic plasmidome data, outperforming other existing tools.

Authors

  • Renmao Tian
    Institute for Food Safety and Health, Illinois Institute of Technology, 6502 S Archer Rd, Bedford Park, IL 60501, United States.
  • Jizhong Zhou
    Institute for Environmental Genomics, Department of Microbiology and Plant Biology, University of Oklahoma, 101 David L Boren Blvd, Norman, OK 73019, United States.
  • Behzad Imanian
    Institute for Food Safety and Health, Illinois Institute of Technology, 6502 S Archer Rd, Bedford Park, IL 60501, United States.