Optimizing machine learning model selection for landslide susceptibility mapping: analysis of similar performance metrics and the critical role of multi-criteria evaluation.

Journal: Environmental science and pollution research international
Published Date:

Abstract

Landslide susceptibility mapping has become an essential task to ensure economic and social sustainability. The use of machine learning algorithms has seen a wide range of applications and demonstrated high performance. However, researchers often face the challenge of validating these models or selecting the best one among them. This research emphasizes the importance of multi-criteria evaluation in assessing the performance of three ensemble learning models, namely gradient boosting classifier (GBC), light gradient boosting machine (LGBM), and extreme gradient boosting (XGBoost), used to produce a landslide susceptibility map (LSM), focusing on the Oued Guebli watershed (Northwestern region of Skikda, Algeria). A comprehensive database was created, incorporating a landslide inventory of 284 points and eight causality factors, including lithology, slope, NDVI, TWI, land use, along with distance to roads, watercourses, and geological faults, which was then split into a training set (70%) and a test set (30%). The performance of the models was assessed using classical evaluation metrics. The results indicate that all models exhibited similar performance, achieving high accuracy (0.9884), precision (0.9886), specificity (1.00), sensitivity (0.9884), F1-score (0.9884), RMSE (0.1078), and Pearson's correlation R (0.9770), highlighting the need to adopt complementary evaluation methods to distinguish subtle differences between these models; in this context, this study employs additional validation techniques, including the area under the curve (AUC) value obtained by plotting the receiver operating characteristic (ROC) curve, which revealed significant differences in model performance, with GBC achieving the best performance with an AUC value of 0.9911, followed by XGBoost at 0.9891, and LGBM at 0.9794. Furthermore, spatial validation, an innovative method used in this study, is based on the percentage of landslides predicted by the models in the very high susceptibility class, showing that the GBC model achieved the highest rate at 99.30%, followed by XGBoost at 97.18%, while LGBM recorded the lowest rate at 88.03%. Additionally, the study incorporated the mean absolute error (MAE) to enhance the evaluation of the model's robustness, with results of 0.0039 for GBC, 0.0371 for XGBoost, and 0.1610 for LGBM, further confirming GBC as the most performant model according to all three validation techniques utilized. Selecting a high-performing model is essential for accurate LSMs, ensuring reliable predictions for risk assessment and disaster prevention. The integration of multiple validation techniques strengthens model robustness and enhances its applicability in resident safety, infrastructure preservation, and effective land-use planning within the Oued Guebli watershed.

Authors

  • Nadjib Mebirouk
    Civil Engineering Department, Faculty of Technology, Laboratory LMGHU, University 20 Août 1955-Skikda, Skikda, Algeria. n.mebirouk@univ-skikda.dz.
  • Moussa Amrane
    Civil Engineering Department, Faculty of Technology, Laboratory LGC-ROI, University of Batna 2 - Mostefa Ben Boulaid, 53, Constantine Road, Fesdis, 05078, Batna, Algeria.
  • Salah Messast
    Civil Engineering Department, Faculty of Technology, Laboratory LMGHU, University 20 Août 1955-Skikda, Skikda, Algeria.
  • Tahar Ayadat
    Department of Civil Engineering, College of Engineering, Prince Mohammad Bin Fahd University (PMU), Al Khobar, Saudi Arabia.

Keywords

No keywords available for this article.