Quantitative evaluation of hydrocarbon contamination in soil using hyperspectral data-a comparative study of machine learning models.
Journal:
Environmental monitoring and assessment
Published Date:
Jul 28, 2025
Abstract
This study aims to evaluate the applicability of existing machine learning and deep learning techniques for the rapid prediction of hydrocarbon contamination in soils using hyperspectral data. Soil samples of three types, i.e., clayey, silty, and sandy, were synthetically contaminated with crude oil, diesel, and gasoline, creating a contamination range of 0 to 10,000 mg/kg. Hyperspectral imaging was employed to capture the spectral signatures of these samples, which were then analyzed using established models, including an XGB regressor and neural networks. Gas chromatography-mass spectrometry (GC-MS) was used to obtain reference contamination values. The models were trained and tested to predict hydrocarbon levels, with performance evaluated using R-squared and RMSE metrics. The models demonstrated strong predictive ability, achieving an R-squared value of 0.96 and an RMSE of 600 mg/kg on the testing set. Performance varied depending on the petroleum type and soil matrix. Gasoline models showed lower accuracy due to less distinguishable spectral features, while diesel and crude oil models performed better. Incorporating selected spectral bands as model inputs further improved performance by reducing overfitting. Among the evaluated models, the XGB regressor consistently provided a good balance between accuracy and robustness. This study highlights the effectiveness of applying hyperspectral spectral analysis with machine learning and deep learning models for soil contamination assessment. The findings support the use of ensemble-based models like XGB for practical spectral applications in environmental monitoring.