Exploring multivariate machine learning frameworks to parallelize PM simultaneous estimations across the continental United States.

Journal: Environmental pollution (Barking, Essex : 1987)

PMID: 40204145

Abstract

Fine particulate matter (PM2.5) comprises diverse chemical components, including elemental carbon (EC), silicon (SI), sulfate (SO), and calcium (CA), each linked to varied health and environmental impacts. Accurately estimating these components' spatial and temporal distributions is crucial for regulatory policies and public health. This study developed and evaluated multivariate machine learning models, including Random Forest (RF) and XGBoost (XGB), to estimate daily concentrations of EC, SI, SO, and CA across the contiguous United States from 2000 to 2019. Unlike traditional univariate approaches, multivariate models capture interdependencies among components, improving accuracy and efficiency. Using data from 534 monitoring sites and 187 predictor variables derived from satellite observations, reanalysis datasets, and geographical sources, we implemented univariate and multivariate RF and XGB models (MRF and MXGBoost). Performance was assessed using R-squared metrics, and feature importance was evaluated with SHAP values. MXGBoost outperformed other models, achieving R values of 70.2 % for EC, 79.23 % for SO, 61.57 % for SI, and 59.5 % for CA, with spatial R exceeding 93 % and temporal R as high as 82.23 % for SO. Key predictors included wind speed, relative humidity, and aerosol optical depth. The findings highlight the advantages of multivariate modeling in capturing the interdependencies among PM2.5 components, resulting in improved estimation accuracy and computational efficiency. This approach offers valuable applications in air quality management and public health, emphasizing the need to refine multivariate frameworks and explore their applicability to other pollutants.

Authors

Kimiya Gohari

Department of Environmental Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States.
Ali Sheidaei

Department of Environmental Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States. Electronic address: ali.sheidaei@mssm.edu.
Maayan Yitshak-Sade

Department of Environmental Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States.
Elena Colicino

Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Itai Kloog

Department of Geography and Environmental Development, Ben-Gurion University of the Negev, Beer Sheva 8410501, Israel.

Keywords

Air Pollutants Air Pollution Environmental Monitoring Machine Learning Particulate Matter United States

External Resources

View on PubMed Access via DOI PubMed (40204145)

Exploring multivariate machine learning frameworks to parallelize PM simultaneous estimations across the continental United States.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals