Identifying predictors of spatiotemporal variations in residential radon concentrations across North Carolina using machine learning analytics.

Journal: Environmental pollution (Barking, Essex : 1987)
PMID:

Abstract

Radon is a naturally occurring radioactive gas derived from the decay of uranium in the Earth's crust. Radon exposure is the leading cause of lung cancer among non-smokers in the US. Radon infiltrates homes through soil and building foundations. This study advances methodologies for assessing residential radon exposure by leveraging a comprehensive dataset of 126,382 short-term (2-7 days) radon test results collected across North Carolina from 2010 to 2020. Employing a combination of linear regression and advanced machine learning techniques, including random forest models. Analysis through linear regression, linear mixed-effects models (LME), and generalized additive models (GAM) using the first-time tested radon levels reveals that elevation, proximity to geological faults, and soil moisture are pivotal in determining radon concentration. Specifically, elevation consistently shows a positive relationship with radon levels across models (linear regression: β = 0.12, p < 0.001; LME: β = 0.17, p < 0.001; GAM: β = 0.11, p < 0.001). Conversely, the distance to geological faults negatively correlates with radon concentration (linear regression: β = -0.11, p < 0.001; LME: β = -0.06, p < 0.001; GAM: β = -0.07, p < 0.001), indicating lower radon levels further from faults. Using the random forest model, our study identifies the most influential environmental predictors of first-time tested radon levels. Elevation is the most influential variable, followed by median instantaneous surface pressure and soil moisture in the upper 10 cm layer, illustrating the significant role of geological and immediate surface conditions. Additional important factors include precipitation, mean temperature, and deeper soil moisture levels (40-200 cm), which underscores the influence of climate on radon variability. Root zone soil moisture and the Normalized Difference Vegetation Index (NDVI) also contribute to predicting radon levels, reflecting the importance of soil and vegetation dynamics in radon emanation. By integrating multiple statistical models, this research provides a nuanced understanding of the predictors of radon concentration, enhancing predictive accuracy and reliability.

Authors

  • Zhenchun Yang
    Duke Global Health Institute, Durham, NC, 27708, United States.
  • Lauren Prox
    Nicholas School of the Environment, Duke University, Durham, NC, 27708, United States.
  • Clare Meernik
    Department of Population Health Sciences, Duke University, Durham, NC, 27708, United States.
  • Yadurshini Raveendran
    Duke Cancer Institute, Duke University, Durham, NC, 27708, United States.
  • David J Press
    Department of Population Health Sciences, Duke University, Durham, NC, 27708, United States.
  • Phillip Gibson
    North Carolina Department of Health and Human Services, Raleigh, NC, 27612, United States.
  • Amie Koch
    Duke School of Nursing, Duke University, Box 3322, Durham, NC, 27710, United States.
  • Olufemi Ajumobi
    North Carolina Department of Health and Human Services, Raleigh, NC, 27612, United States.
  • Ruoxue Chen
    Nicholas School of the Environment, Duke University, Durham, NC, 27708, United States.
  • Junfeng Jim Zhang
    Duke Global Health Institute, Durham, NC, 27708, United States; Nicholas School of the Environment, Duke University, Durham, NC, 27708, United States.
  • Tomi Akinyemiju
    Department of Population Health Sciences, Duke University, Durham, NC, 27708, United States; Duke Cancer Institute, Duke University, Durham, NC, 27708, United States.