Machine learning and deep learning modeling and simulation for predicting PM2.5 concentrations.

Journal: Chemosphere
Published Date:

Abstract

Particulate matter (PM) pollution greatly endanger human physical and mental health, and it is of great practical significance to predict PM concentrations accurately. This study measured one-year monitoring data of six main meteorological parameters and PM2.5 concentrations independently at two monitoring sites in central China's Hunan Province. These datasets were then employed to train, validate, and evaluate the proposed extreme gradient boosting (XGBoost) machine learning model and the fully connected neural network deep learning model, respectively. The performances of the two models were compared, analyzed, and optimized through model parameter tuning. The XGBoost model had better prediction ability with R higher than 0.761 in the complete test dataset. When the complete dataset was divided into stratified sub-sets by daytime-nighttime periods, the value of R increased to 0.856 in the nighttime test dataset. The feature importance and influential mechanism of meteorological variables on PM2.5 concentrations were analyzed and discussed.

Authors

  • Jian Peng
    Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL, USA.
  • Haisheng Han
    School of Minerals Processing and Bioengineering, Central South University, Changsha, 410083, China.
  • Yong Yi
    Department of Liver Surgery, Liver Cancer Institute, Zhongshan Hospital, and Key Laboratory of Carcinogenesis and Cancer Invasion (Ministry of Education), Fudan University, Shanghai, People's Republic of China. yi.yong@zs-hospital.sh.cn.
  • Huimin Huang
  • Le Xie
    Research Institute of Med-X, Shanghai Jiao Tong University, Shanghai, China.