Variable selection in social-environmental data: sparse regression and tree ensemble machine learning approaches.
Journal:
BMC medical research methodology
Published Date:
Dec 10, 2020
Abstract
BACKGROUND: Social-environmental data obtained from the US Census is an important resource for understanding health disparities, but rarely is the full dataset utilized for analysis. A barrier to incorporating the full data is a lack of solid recommendations for variable selection, with researchers often hand-selecting a few variables. Thus, we evaluated the ability of empirical machine learning approaches to identify social-environmental factors having a true association with a health outcome.