Integrating partial least square structural equation modelling and machine learning for causal exploration of environmental phenomena.
Journal:
Environmental research
PMID:
40081645
Abstract
Understanding the causes of environmental phenomena is crucial for promoting positive outcomes and mitigating negative ones. Partial least squares structural equation modelling (PLS-SEM) is becoming a valuable tool for evaluating causal relationships in ecological environment studies (EES). However, many studies using PLS-SEM often overlook nonlinear relationships and interactions between environmental factors, and have not fully utilized the powerful capabilities of machine learning. Using Gaoyang Lake in the Three Gorges Reservoir Region as a case study, this research presents a framework combining several techniques to better understand the causes of Spring Harmful Algal Blooms (Spring HABs) from 2019 to 2023. The framework uses PLS-SEM to compare and select the optimum causal structure among alternatives, Bayesian Networks (BN) to identify alternative causal pathways, Multivariate Adaptive Regression Splines (MARS) and Polynomial Regression (PR) to uncover interactions and non-linearities among predictors. Our findings indicate that, the BN-generated structure implemented in PLS-SEM had an improved Bayesian Information Criterion (BIC) score compared to the initial PLS-SEM. No interactions between latent variables were observed using MARS. However, significant non-linearities were identified using PR, and when integrated into the initial PLS-SEM, they produced the optimal model with Qpredict of 0.177, RMSE of 0.967, R of 0.421, and BIC of -23.497. Euphotic depth emerged as a critical factor influencing the occurrence of Spring HABs, due to its interaction with the epilimnion depth. Surface nutrient levels (indicated by total phosphorus loadings) and meteorological elements (mean air temperature and sun hours) were identified as the second and third most important latent variables, contributing 25.5 % and 13.5 % to Spring HABs, respectively. This framework is recommended for improving the causal understanding of other site-specific environmental phenomena, providing a scientific basis for more effective environmental management.