PM concentration prediction using machine learning algorithms: an approach to virtual monitoring stations.
Journal:
Scientific reports
PMID:
40057563
Abstract
One of the most important pollutants is PM, which is particularly important to monitor pollutant levels to keep the pollutant concentration under control. In this research, an attempt has been made to predict the concentrations of PM using four Machine Learning (ML) models. The ML methods include Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting Regressor (XGBR), Random Forest (RF) and Gradient Boosting Regressor (GBR). The mean and maximum concentration of PM were recorded 32.84 µg/m and 160.25 µg/m, respectively, indicating the occurrence of occasional episodes of high pollution levels from 2016 to 2022. The PM2.5 concentrations dropped below 30 µg/m in 2018 due to reduced human activities during COVID-19 lockdowns but PM levels were significantly increased because of the ongoing operation of heavy industries post-COVID-19 lockdowns during 2021. The ML models performed very well in predicting the concentrations of PM with around 95% of their predictions falling within the factor of the observed concentration. The results presented that among the four ML algorithms, GBR confirmed good model performance compared to the other models, with the lowest MSE (5.33) and RMSE (2.31), as well as high accuracy measures. This suggests that GBR is the best model for reducing large errors, making it more robust in capturing variations in PM2.5 levels. In conclusion, the study proposed a method to obtain high-accuracy PM prediction results using ML which are useful for air quality monitoring on a global scale and improving acute exposure assessment in epidemiological research.