Unlocking urban soil secrets: machine learning and spectrometry in Berlin's heavy metal pollution study considering spatial data.
Journal:
Environmental monitoring and assessment
Published Date:
Jul 19, 2025
Abstract
Berlin has historically been impacted by heavy metal (HM) emissions, raising concerns about soil pollution. In this study, machine learning (ML) techniques were applied to predict HM concentrations across the Berlin metropolitan area. A dataset of 667 soil samples was used, containing spectrometry data and concentrations of nine HMs: arsenic (As), cadmium (Cd), cobalt (Co), chromium (Cr), copper (Cu), manganese (Mn), nickel (Ni), lead (Pb), and zinc (Zn). Four ML algorithms, including partial least square regression (PLSR), support vector machine regression (SVMR), random forest (RF), and Gaussian process regression (GPR), were employed to model and predict HMs. To address the often-ignored spatial dimension, samples were also stratified into five land use/land cover (LULC) classes: park, forest, farmland, traffic area, and constructed land. Among the full dataset, SVMR yielded the best performance in predicting Zn ( = 0.65, RMSE = 34.61 mg/kg, ME = 0.40). For stratified modeling, PLSR performed best for Ni in farmland ( = 0.77), while RF was most accurate for Ni in traffic areas ( = 0.82). The results highlighted improved model performance within farmland and traffic areas compared to the unstratified dataset, demonstrating the value of incorporating spatial context in soil HM prediction.