Transferability of Machine Learning Models for Geogenic Contaminated Groundwaters.

Journal: Environmental science & technology
Published Date:

Abstract

Machine learning models show promise in identifying geogenic contaminated groundwaters. Modeling in regions with no or limited samples is challenging due to the need for large training sets. One potential solution is transferring existing models to such regions. This study explores the transferability of high fluoride groundwater models between basins in the Shanxi Rift System, considering six factors, including modeling methods, predictor types, data size, sample/predictor ratio (SPR), predictor range, and data informing. Results show that transferability is achieved only when model predictors are based on hydrochemical parameters rather than surface parameters. Data informing, i.e., adding samples from challenging regions to the training set, further enhances the transferability. Stepwise regression shows that hydrochemical predictors and data informing significantly improve transferability, while data size, SPR, and predictor range have no significant effects. Additionally, despite their stronger nonlinear capabilities, random forests and artificial neural networks do not necessarily surpass logistic regression in transferability. Lastly, we utilize the -SNE algorithm to generate low-dimensional representations of data from different basins and compare these representations to elucidate the critical role of predictor types in transferability.

Authors

  • Hailong Cao
    College of Resources and Environment, Yangtze University, Wuhan 430100, China.
  • Xianjun Xie
    School of Environmental Studies, China University of Geosciences, Wuhan 430074, China; State Environmental Protection Key Laboratory of Source Apportionment and Control of Aquatic Pollution, China University of Geosciences, Wuhan 430078, China. Electronic address: xjxie@cug.edu.cn.
  • Ziyi Xiao
    State Environmental Protection Key Laboratory of Source Apportionment and Control of Aquatic Pollution, China University of Geosciences, Wuhan 430078, China.
  • Wenjing Liu
    Children Rehabilitation Center, the Affiliated Hospital of Jining Medical University, Jining, China.