Machine learning based prediction by PlantCdMiner and experimental validation of cadmium-responsive genes in plants.
Journal:
Journal of hazardous materials
Published Date:
Aug 15, 2025
Abstract
Plants have evolved diverse adaptive mechanisms to sense and respond to environmental stimuli such as cadmium stress. The regulation of gene expression plays a critical role in plant responses to abiotic stress. However, homologous genes from different plant species or even different genotypes within the same species often show divergent responses to stress, and sequence homology does not necessarily imply functional similarity. Therefore, current homology alignment approaches to predicting transcriptional response to the specific stress have inherent limitations. In this study, we trained supervised classification models using the Random Forest algorithm to predict cadmium-responsive genes based on gene sequence features in Arabidopsis thaliana, Avicennia marina, Hordeum vulgare, and Nicotiana tabacum. Our models successfully predicted transcriptional response to cadmium stress both within and across species. The results suggested that transcriptome data from well-studied species can be used to predict cadmium-responsive genes in other species lacking such data. Cis-regulatory elements analysis further revealed that MYB TFs play essential roles in cadmium stress responses. Additionally, we experimentally confirmed that the MYB TF Am06526 activates the expression of AmPCR2 using yeast one-hybrid and dual-luciferase reporter assays. Finally, we developed PlantCdMiner (https://jasonxu.shinyapps.io/PlantCdMiner/), a web-based tool that enables users to predict cadmium-responsive genes and visualize cis-regulatory elements based on genomic features using machine learning.