Groundwater health probability risk prediction through oral intake using advanced optimization methods.

Journal: Journal of contaminant hydrology

Published Date: Jul 7, 2025

Abstract

Examining the cancer risk associated with oral groundwater (GW) intake is crucial, particularly in regions heavily reliant on GW for human consumption and agriculture. The study was based on real field investigations and controlled laboratory experiments. We integrated real experimental data with generative AI-driven synthetic data to construct a comprehensive dataset. Subsequently, we compared the predictive efficiency of both data sources. We evaluated the reliability of generative AI in generating scientific data, providing critical insights into its applicability for enhancing experimental analysis. The study also evaluates standalone models, including Artificial Neural Networks (ANN), Gaussian Process Regression (GPR), Support Vector Machines (SVM), and Boosted Trees (BT), with and without Bayesian Optimization (BO), for predicting the probability of cancer risk (PCR) from GW ingestion. On real data, during training, ANN achieved the lowest Mean Absolute Error (MAE = 0.1483), Mean Square Error (MSE = 0.1231), and Root Mean Square Error (RMSE = 0.3508), while GPR, SVM, and BT exhibited higher training errors. In the testing phase, ANN continued to lead with an MAE of 0.5733, MSE of 0.6356, and RMSE of 0.7972. When optimized with BO, ANN-BO achieved an MAE of 0.1686, MSE of 0.1097, and RMSE of 0.3312 during training, with GPR + BO close behind (MAE = 0.1679, MSE = 0.1095, RMSE = 0.3310). During testing with BO, ANN-BO further improved (MAE = 0.0902, MSE = 0.0129, RMSE = 0.1136). However, on synthetic data, even optimized models like ANN-BO demonstrated higher testing error (MAE = 15.718, MSE = 374.53, RMSE = 19.353), underscoring limitations in capturing real-world complexities. High error values across models indicate that synthetic data alone is insufficient for accurate health risk assessments. Leveraging real-world data remains essential for enhancing predictive accuracy and minimizing errors, emphasizing the crucial role of data quality in achieving reliable cancer risk predictions from genome-wide (GW) ingestion.

Authors

Fahad Jibrin Abdu

SDAIA-KFUPM Joint Research Center for Artificial Intelligence (JRCAI), King Fahd University of Petroleum & Minerals (KFUPM), Dhahran 31261, Saudi Arabia.
Sani I Abba

Interdisciplinary Research Centre for Membranes and Water Security, King Fahd University of Petroleum and Minerals, Dhahran, 31261, Saudi Arabia. Electronic address: saniisaabba86@gmail.com.
Jamilu Usman

Interdisciplinary Research Centre for Membrane and Water Security, King Fahd University of Petroleum and Minerals, Dhahran, 31261, Saudi Arabia.
Maad Alowaifeer

SDAIA-KFUPM Joint Research Center for Artificial Intelligence, King Fahd University of Petroleum & Minerals, 31261, Dhahran, Saudi Arabia; Electrical Engineering Department, King Fahd University of Petroleum & Minerals, 31261, Dhahran, Saudi Arabia.
Isam H Aljundi

Interdisciplinary Research Center for Membranes and Water Security, King Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia; Department of Chemical Engineering, King Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia.

Keywords

Bayes Theorem Environmental Monitoring Groundwater Humans Neoplasms Neural Networks, Computer Probability Risk Assessment Support Vector Machine Water Pollutants, Chemical

External Resources

View on PubMed Access via DOI PubMed (40663998)

Groundwater health probability risk prediction through oral intake using advanced optimization methods.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Groundwater health probability risk prediction through oral intake using advanced optimization methods.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals