Optimizing Feature Selection in Causal Inference: A Three-Stage Computational Framework for Unbiased Estimation
Journal:
arXiv
Published Date:
Feb 1, 2025
Abstract
Feature selection is an important but challenging task in causal inference
for obtaining unbiased estimates of causal quantities. Properly selected
features in causal inference not only significantly reduce the time required to
implement a matching algorithm but, more importantly, can also reduce the bias
and variance when estimating causal quantities. When feature selection
techniques are applied in causal inference, the crucial criterion is to select
variables that, when used for matching, can achieve an unbiased and robust
estimation of causal quantities. Recent research suggests that balancing only
on treatment-associated variables introduces bias while balancing on spurious
variables increases variance. To address this issue, we propose an enhanced
three-stage framework that shows a significant improvement in selecting the
desired subset of variables compared to the existing state-of-the-art feature
selection framework for causal inference, resulting in lower bias and variance
in estimating the causal quantity. We evaluated our proposed framework using a
state-of-the-art synthetic data across various settings and observed superior
performance within a feasible computation time, ensuring scalability for
large-scale datasets. Finally, to demonstrate the applicability of our proposed
methodology using large-scale real-world data, we evaluated an important US
healthcare policy related to the opioid epidemic crisis: whether opioid use
disorder has a causal relationship with suicidal behavior.