The interpretable machine learning model for depression associated with heavy metals via EMR mining method.
Journal:
Scientific reports
PMID:
40155474
Abstract
Limited research exists on the association between depression and heavy metal exposure. This study aims to develop an interpretable and efficient machine learning (ML) model with robust performance to identify depression linked to heavy metal exposure. Data were derived from the US National Health and Nutrition Examination Survey (NHANES) spanning from 2013 to March 2020. We constructed 5 ML models to detect depression based on heavy metal exposure and assessed their performance using 10 discrimination metrics. The optimal model was selected after parameter tuning with a Genetic Algorithm (GA). To enhance the interpretability of the model's predictions, we applied SHapley Additive exPlanation (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME) algorithms. The study included 19,368 participants. The highest-performing model, an eXtreme Gradient Boosting (XGB) algorithm optimized with GA, identified depression using 16 heavy metal indicators (AUC: 0.686; 95% CI: 0.68-0.69; accuracy: 97.1%). SHAP analysis revealed that elevated blood cadmium levels had a positive influence on the model's prediction of depression, while urine concentrations of barium, thallium, tin, manganese, antimony, lead, and tungsten, along with blood levels of lead, cadmium, mercury, selenium, and manganese, showed a negative influence. In conclusion, the study successfully utilized an efficient and robust GA-XGB model to identify depression linked to heavy metal exposure, supported by SHAP and LIME explanations. Blood cadmium was positively correlated with depression, whereas barium, thallium, tin, manganese, antimony, lead, and tungsten in urine, along with lead, cadmium, mercury, selenium, and manganese in blood, were negatively correlated with depression.