Machine Learning Prediction of Laccase-Catalyzed Oxidation of Aromatic Compounds Using Curated Enzyme-Specific Datasets.

Journal: Journal of computational chemistry
Published Date:

Abstract

Laccases are multi-copper oxidase enzymes that oxidize a wide range of aromatic and non-aromatic compounds using molecular oxygen, producing water as the sole byproduct and making them attractive biocatalysts for green chemistry. However, the ability of laccases to oxidize specific substrates depends on a complex interplay of molecular structure, enzyme properties, redox potential, and environmental context, making laccase-substrate compatibility hard to predict. We apply machine learning models to pre-screen laccase-substrate combinations, streamlining experimental workflows. We evaluate four classical classifiers and a transformer-based model (ChemBERTa) on three in-house curated datasets of aromatic substrates with oxidation profiles for distinct laccases. Overall, the tested models achieve comparable performance, with random forest (RFC) demonstrating more stability across different data splits and laccases. This assessment is complemented by RFC feature-importance and ChemBERTa attention analyses, which highlight molecular features associated with oxidation outcomes. We also introduce a lightweight tool to visualize ChemBERTa predictions by mapping SMILES attributions onto molecular graphs. These findings provide a robust, interpretable framework for accelerating laccase-substrate discovery.

Authors

Keywords

No keywords available for this article.