Predicting blood levels of mercury and selenium in Amazonian riverines: A machine learning approach based on questionnaire data.
Journal:
Chemosphere
Published Date:
Apr 10, 2026
Abstract
This study aims to develop and evaluate machine learning models to predict blood levels of mercury and selenium in Brazilian riverines. Blood samples were collected from 1031 participants and analyzed for mercury and selenium concentrations. For data analysis, 17 algorithms were trained to estimate mercury and selenium blood levels based on 175 variables from questionnaires. The models were evaluated in terms of coefficient of determination (R2), mean absolute percentage error (MAPE), and mean absolut error (MAE - ppb levels). The best algorithms were evaluated using Shapley Additive Explanations (SHAP). Multiple linear regression (MLR) was conducted with the top 20 predictors from the SHAP analysis to increase the findings' robustness. Gradient boost regressor and random forest were the best model for mercury and selenium, respectively. The trained models achieved the R2:0.24; MAPE:53%; MAE:8.33 scores for Mercury and R2:0.13; MAPE:27%, MAE:54.6 for selenium. The error plots showed good prediction performance for individuals in lower concentration ranges but high errors for participants at higher levels, where the risk of toxic effects is usually more significant. In SHAP and MLR analysis, fish consumption, particularly of predatory species, was the most important variable correlated with increased mercury levels. For selenium, SHAP analysis showed that nut intake and some herbivorous fish consumption were associated with higher selenium levels. The limited performance of the models restricted their applicability, but SHAP analysis and MLR identified several variables that could be valuable for managing exposure to these elements.
Authors
Keywords
No keywords available for this article.