Preprocessing of natural language process variables using a data-driven method improves the association with suicide risk in a large veterans affairs population.

Journal: Computers in biology and medicine
PMID:

Abstract

OBJECTIVE: Suicide risk assessment has historically relied heavily on clinical evaluations and patient self-reports. Natural language processing (NLP) of electronic health records (EHRs) provides an alternative approach for extracting risk predictors from clinical notes. Modeling NLP variables, however, is challenging because of zero inflation and skewed distributions. Therefore, we evaluated whether an adaptive-mixture-categorization (AMC) method could optimize the suicide risk predictive capacity of NLP data extracted from Veterans Affairs (VA) EHR notes.

Authors

  • Siting Li
    Department of Biomedical Data Science, Dartmouth College, Hanover, NH, USA.
  • Maxwell Levis
    White River Junction VA Medical Center, White River Junction, VT, USA.
  • Monica Dimambro
    White River Junction VA Medical Center, White River Junction, VT, USA.
  • Weiyi Wu
    Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA.
  • Joshua Levy
    Department of Pathology & Laboratory Medicine, Cedars-Sinai Medical Center, Los Angeles, California.
  • Brian Shiner
    White River Junction VA Medical Center, White River Junction, VT, USA.
  • Jiang Gui
    Geisel School of Medicine at Dartmouth, Hanover, NH, USA.