A Two-Stage Interpretable Matching Framework for Causal Inference
Journal:
arXiv
Published Date:
Apr 13, 2025
Abstract
Matching in causal inference from observational data aims to construct
treatment and control groups with similar distributions of covariates, thereby
reducing confounding and ensuring an unbiased estimation of treatment effects.
This matched sample closely mimics a randomized controlled trial (RCT), thus
improving the quality of causal estimates. We introduce a novel Two-stage
Interpretable Matching (TIM) framework for transparent and interpretable
covariate matching. In the first stage, we perform exact matching across all
available covariates. For treatment and control units without an exact match in
the first stage, we proceed to the second stage. Here, we iteratively refine
the matching process by removing the least significant confounder in each
iteration and attempting exact matching on the remaining covariates. We learn a
distance metric for the dropped covariates to quantify closeness to the
treatment unit(s) within the corresponding strata. We used these high- quality
matches to estimate the conditional average treatment effects (CATEs). To
validate TIM, we conducted experiments on synthetic datasets with varying
association structures and correlations. We assessed its performance by
measuring bias in CATE estimation and evaluating multivariate overlap between
treatment and control groups before and after matching. Additionally, we apply
TIM to a real-world healthcare dataset from the Centers for Disease Control and
Prevention (CDC) to estimate the causal effect of high cholesterol on diabetes.
Our results demonstrate that TIM improves CATE estimates, increases
multivariate overlap, and scales effectively to high-dimensional data, making
it a robust tool for causal inference in observational data.