Beyond Residential Ambient Concentrations: Quantifying Exposure Error and Advancing Personal PM2.5 Prediction with a Scalable Modeling Framework.
Journal:
Environmental science & technology
Published Date:
Jan 26, 2026
Abstract
Accurate assessment of personal PM2.5 exposure is essential but challenging in large-scale epidemiology, as conventional residential ambient data often lead to exposure misclassification. This study aimed to quantify errors in ambient data proxies and develop a scalable modeling framework for personal exposure prediction using readily available data. We conducted a panel study with 12 adults from three diverse Chinese cities, collecting 4571 person-hours of personal PM2.5 measurements. These were compared against three ambient data sources to quantify relative errors. We developed a modeling framework integrating ambient concentrations, meteorological variables, and basic personal characteristics, incorporating systematic preprocessing, feature engineering, variable selection, and multialgorithm comparison optimized through hyperparameter tuning and cross-validation. Results showed substantial personal-ambient exposure discrepancies, with relative errors of 39-48% for the daily average level. The framework successfully predicted personal exposure, with a Random Forest model using daily monitoring-station data achieving the highest performance (R2 = 0.87). SHAP analysis identified ambient PM2.5 as the dominant predictor, with personal traits and meteorology also contributing significantly. This work provides a validated, end-to-end modeling framework that moves beyond ambient proxies, offering a standardized workflow to refine personal exposure estimates in large cohorts and enhance the validity of air pollution health studies.
Authors
Keywords
No keywords available for this article.