Accuracy and transportability of machine learning models for adolescent suicide prediction with longitudinal clinical records.

Journal: Translational psychiatry
PMID:

Abstract

Machine Learning models trained from real-world data have demonstrated promise in predicting suicide attempts in adolescents. However, their transportability, namely the performance of a model trained on one dataset and applied to different data, is largely unknown, hindering the clinical adoption of these models. Here we developed different machine learning-based suicide prediction models based on real-world data collected in different contexts (inpatient, outpatient, and all encounters) with varying purposes (administrative claims and electronic health records), and compared their cross-data performance. The three datasets used were the All-Payer Claims Database in Connecticut, the Hospital Inpatient Discharge Database in Connecticut, and the Electronic Health Records data provided by the Kansas Health Information Network. We included 285,320 patients among whom we identified 3389 (1.2%) suicide attempters and 66% of the suicide attempters were female. Different machine learning models were evaluated on source datasets where models were trained and then applied to target datasets. More complex models, particularly deep long short-term memory neural network models, did not outperform simpler regularized logistic regression models in terms of both local and transported performance. Transported models exhibited varying performance, showing drops or even improvements compared to their source performance. While they can achieve satisfactory transported performance, they are usually upper-bounded by the best performance of locally developed models, and they can identify additional new cases in target data. Our study uncovers complex transportability patterns and could facilitate the development of suicide prediction models with better performance and generalizability.

Authors

  • Chengxi Zang
    The Department of Population Health Sciences (Zang, Wang), Weill Cornell Medicine, New York, New York.
  • Yu Hou
    Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN 55455, USA.
  • Daoming Lyu
    Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, Cornell, USA.
  • Jun Jin
    Department of Statistics, University of Connecticut, Connecticut, USA.
  • Shane Sacco
    Department of Statistics, University of Connecticut, Connecticut, USA.
  • Kun Chen
    Department of Anesthesiology, Yongchuan Hospital of Chongqing Medical University, Chongqing, China.
  • Robert Aseltine
    University of Connecticut Health Center, Connecticut, USA. aseltine@uchc.edu.
  • Fei Wang
    Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY, United States.