Fast Interpretable Greedy-Tree Sums.

Journal: Proceedings of the National Academy of Sciences of the United States of America
PMID:

Abstract

Modern machine learning has achieved impressive prediction performance, but often sacrifices interpretability, a critical consideration in high-stakes domains such as medicine. In such settings, practitioners often use highly interpretable decision tree models, but these suffer from inductive bias against additive structure. To overcome this bias, we propose Fast Interpretable Greedy-Tree Sums (FIGS), which generalizes the Classification and Regression Trees (CART) algorithm to simultaneously grow a flexible number of trees in summation. By combining logical rules with addition, FIGS adapts to additive structure while remaining highly interpretable. Experiments on real-world datasets show FIGS achieves state-of-the-art prediction performance. To demonstrate the usefulness of FIGS in high-stakes domains, we adapt FIGS to learn clinical decision instruments (CDIs), which are tools for guiding decision-making. Specifically, we introduce a variant of FIGS known as Group Probability-Weighted Tree Sums (G-FIGS) that accounts for heterogeneity in medical data. G-FIGS derives CDIs that reflect domain knowledge and enjoy improved specificity (by up to 20% over CART) without sacrificing sensitivity or interpretability. Theoretically, we prove that FIGS learns components of additive models, a property we refer to as disentanglement. Further, we show (under oracle conditions) that tree-sum models leverage disentanglement to generalize more efficiently than single tree models when fitted to additive regression functions. Finally, to avoid overfitting with an unconstrained number of splits, we develop Bagging-FIGS, an ensemble version of FIGS that borrows the variance reduction techniques of random forests. Bagging-FIGS performs competitively with random forests and XGBoost on real-world datasets.

Authors

  • Yan Shuo Tan
    Department of Statistics and Data Science, National University of Singapore, Singapore 119077, Republic of Singapore.
  • Chandan Singh
  • Keyan Nasseri
    Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720.
  • Abhineet Agarwal
    Statistics Department, University of California, Berkeley, CA 94720.
  • James Duncan
    Department of Radiology and Biomedical Imaging, Yale University, New Haven, Connecticut, USA.
  • Omer Ronen
    Statistics Department, University of California, Berkeley, CA 94720.
  • Matthew Epland
    Overjet, New York, NY.
  • Aaron Kornblith
    Department of Emergency Medicine, University of California, San Francisco, CA 94113.
  • Bin Yu
    Department of Anesthesiology, Peking University First Hospital, Ningxia Women's and Children's Hospital, Yinchuan, China.