XenoBug: machine learning-based tool to predict pollutant-degrading enzymes from environmental metagenomes.
Journal:
NAR genomics and bioinformatics
PMID:
40314024
Abstract
Application of machine learning-based methods to identify novel bacterial enzymes capable of degrading a wide range of xenobiotics offers enormous potential for bioremediation of toxic and carcinogenic recalcitrant xenobiotics such as pesticides, plastics, petroleum, and pharmacological products that adversely impact ecology and health. Using 6814 diverse substrates involved in ∼141 200 biochemical reactions, we have developed 'XenoBug', a machine learning-based tool that predicts bacterial enzymes, enzymatic reaction, the species capable of biodegrading xenobiotics, and the metagenomic source of the predicted enzymes. For training, a hybrid feature set was used that comprises 1603 molecular descriptors and linear and circular fingerprints. It also includes enzyme datasets consisting of ∼3.3 million enzyme sequences derived from an environmental metagenome database and ∼16 million enzymes from ∼38 000 bacterial genomes. For different reaction classes, XenoBug shows very high binary accuracies (>0.75) and F1 scores (>0.62). XenoBug is also validated on a set of diverse classes of xenobiotics such as pesticides, environmental pollutants, pharmacological products, and hydrocarbons known to be degraded by the bacterial enzymes. XenoBug predicted known as well as previously unreported metabolic enzymes for the degradation of molecules in the validation set, thus showing its broad utility to predict the metabolism of any input xenobiotic molecules. XenoBug is available on: https://metabiosys.iiserb.ac.in/xenobug.