Assessing Surrogate Heterogeneity in Real World Data Using Meta-Learners
Journal:
arXiv
Published Date:
Apr 21, 2025
Abstract
Surrogate markers are most commonly studied within the context of randomized
clinical trials. However, the need for alternative outcomes extends beyond
these settings and may be more pronounced in real-world public health and
social science research, where randomized trials are often impractical.
Research on identifying surrogates in real-world non-randomized data is scarce,
as available statistical approaches for evaluating surrogate markers tend to
rely on the assumption that treatment is randomized. While the few methods that
allow for non-randomized treatment/exposure appropriately handle confounding
individual characteristics, they do not offer a way to examine surrogate
heterogeneity with respect to patient characteristics. In this paper, we
propose a framework to assess surrogate heterogeneity in real-world, i.e.,
non-randomized, data and implement this framework using various meta-learners.
Our approach allows us to quantify heterogeneity in surrogate strength with
respect to patient characteristics while accommodating confounders through the
use of flexible, off-the-shelf machine learning methods. In addition, we use
our framework to identify individuals for whom the surrogate is a valid
replacement of the primary outcome. We examine the performance of our methods
via a simulation study and application to examine heterogeneity in the
surrogacy of hemoglobin A1c as a surrogate for fasting plasma glucose.