Predicting Cd accumulation in crops and identifying nonlinear effects of multiple environmental factors based on machine learning models.

Journal: The Science of the total environment
PMID:

Abstract

The traditional prediction of the Cd content in grains (Cd) of crops primarily relies on the multiple linear regression models based on soil Cd content (Cd) and pH, neglecting inter-factorial interactions and nonlinear causal links between external environmental factors and Cd. In this study, a comprehensive index system of multi-type environmental factors including soil properties, geology, climate, and anthropogenic activity was constructed. The machine learning models of the tree-based ensemble, support vector regression, artificial neural network for predicting Cd of rice and wheat based on the environmental factor indexes significantly improved the accuracy than the traditional models of linear regression based on soil properties. Among them, the tree-based ensemble models of XGboost and random forest exhibited highest accuracies for predicting Cd of rice and wheat, with R in the test dataset of 0.349 and 0.546, respectively. This study found that soil properties, including Cd, pH, and clay, have greater impacts on Cd of rice and wheat, with combined contribution rates accounting for 65.2 % and 29.7 % respectively. Since wheat sampling areas are located in central and northern China, they are more constrained by precipitation and temperature than rice sampling areas in the south. Geologic and climate factors have a greater impact on Cd of wheat, with a combined contribution rate of 49.9 %, which is higher than the corresponding rate of 20.9 % in rice. Furthermore, the Cd of rice and wheat did not exhibit an absolute linear relationship with Cd, and excessively high Cd can reduce the bioconcentration factor of Cd accumulation in crops. Meanwhile, other environmental factors such as temperature, precipitation, elevation have marginal effects on the increase of Cd of crops. This study provides a novel framework to optimize traditional soil plant transfer models, as well as offer a step towards realizing high precision prediction of Cd content in crops.

Authors

  • Xiaosong Lu
    State Environmental Protection Key Laboratory of Soil Environmental Management and Pollution Control, Nanjing Institute of Environmental Sciences, Ministry of Ecology and Environment, Nanjing 210042, China.
  • Li Sun
    Icahn School of Medicine at Mount Sinai, New York, NY, USA.
  • Ya Zhang
    Department of Plant Protection, College of Plant Protection, Hunan Agricultural University, Changsha, China. Electronic address: zhangya230@126.com.
  • Junyang Du
    State Environmental Protection Key Laboratory of Soil Environmental Management and Pollution Control, Nanjing Institute of Environmental Sciences, Ministry of Ecology and Environment, Nanjing 210042, China.
  • Guoqing Wang
    Department of Pathogenobiology, Basic Medical College of Jilin University, Changchun, Jilin, 130012, People's Republic of China. qing@jlu.edu.cn.
  • Xinghua Huang
    State Environmental Protection Key Laboratory of Soil Environmental Management and Pollution Control, Nanjing Institute of Environmental Sciences, Ministry of Ecology and Environment, Nanjing 210042, China; College of Environmental Science and Engineering, Yangzhou University, Yangzhou 225127, China.
  • Xuzhi Li
    Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences, Beijing, 100094, China.
  • Xiaozhi Wang
    College of Information Science and Electronic Engineering, Hangzhou 310027, People's Republic of China.