Methodological Integration of Machine Learning and Geospatial Analysis for PM Pollution Mapping.
Journal:
MethodsX
Published Date:
Apr 17, 2025
Abstract
Air pollution mitigation necessitates accurate spatial modelling to inform public health interventions. Traditional approaches inadequately capture complex predictor-pollutant interactions, whereas machine learning (ML) offers a superior capacity for modelling nonlinear relationships. This study compares three ML Random Forest (RF), K-Nearest Neighbors (KNN), and Naïve Bayes (NB) algorithms using annual PM data from 11 monitoring stations alongside atmospheric, urban, and terrain covariates. The methodological framework employed rigorous preprocessing and cross-validation to classify pollution into three categorical levels. Results demonstrate RF superior performance, achieving 94% balanced accuracy and 97% specificity, significantly outperforming KNN (92%) and NB (89%). RF excelled in capturing spatial heterogeneity and complex variable interactions, while KNN and NB exhibited limitations in managing feature dependencies and localized variability. Despite computational demands, findings substantiate RF reliability for robust air quality monitoring applications. The study contributes valuable insights for implementing scalable pollution prediction systems in resource-constrained urban environments while acknowledging interpretability challenges inherent to complex ML models.•Preprocessing of spatial data from various sources, incorporating the handling of missing/abnormal data, analysis, and normalization•Implementation of the three ML algorithms with rigorous hyperparameter tuning, model validation, and performance assessment•Mapping PM Hotspots on the Gradient Direction and Distance from the City Center.
Authors
Keywords
No keywords available for this article.