Machine Learning Methods for Estimating Personalized Treatment Effects-Insights on validity from two large trials.
Journal:
American journal of epidemiology
Published Date:
Mar 20, 2026
Abstract
Machine learning (ML) methods have the potential to improve precision medicine by estimating personalized treatment effects. However, formal validation of these methods remains limited, leaving their reliability in empirical settings largely uncertain. In this study, we evaluated the internal and external validity of 17 causal heterogeneity ML methods-including metalearners, tree-based methods, and deep learning methods-using data from two large randomized controlled trials: the International Stroke Trial (n = 19 435) and the Chinese Acute Stroke Trial (n = 21 106). We assessed performance using three visual-based metrics and three quantitative metrics. Our analysis found that none of the ML methods consistently demonstrated reliable performance, neither internal nor external. Heterogeneous treatment effects estimated from training data failed to generalize to the test data, even in the absence of distribution shifts. These results raise concerns about the current applicability of ML models in precision medicine and highlight the need for more robust validation techniques to ensure generalizability.
Authors
Keywords
No keywords available for this article.