Discovery of High-Performance Extremophiles and Extremozymes Using Machine Learning and Structure-Based Clustering.
Journal:
Environmental science & technology
Published Date:
Jun 3, 2026
Abstract
The exploration of extremophiles─microorganisms that thrive in extreme environments─is crucial for advancing biotechnological applications and understanding the limits of life. However, traditional methods for identifying extremophiles are labor-intensive and low-efficiency. Here we introduce iExtreme, a machine learning model that accurately predicts extremophile characteristics employing a sophisticated Support Vector Machine (SVM) framework based on k-mer features of nucleotides and codon combinations extracted from genome sequences. Our model, trained on a curated data set of 1030 extremophilic genomes, achieves accuracies of 0.988, 0.939, and 0.938 in identifying halophiles, thermophiles, and pH-philes, respectively. Utilizing iExtreme, we discovered 520 novel extremophilic species and 5255 genomes from various databases, and a significant number of novel extremozymes via structure-based protein clustering, including d-psicose 3-epimerases (DPEase) and α-amylases. These results demonstrate the usefulness of iExtreme.
Authors
Keywords
No keywords available for this article.