Use of natural language processing in electronic medical records to identify pregnant women with suicidal behavior: towards a solution to the complex classification problem.

Journal: European journal of epidemiology
Published Date:

Abstract

We developed algorithms to identify pregnant women with suicidal behavior using information extracted from clinical notes by natural language processing (NLP) in electronic medical records. Using both codified data and NLP applied to unstructured clinical notes, we first screened pregnant women in Partners HealthCare for suicidal behavior. Psychiatrists manually reviewed clinical charts to identify relevant features for suicidal behavior and to obtain gold-standard labels. Using the adaptive elastic net, we developed algorithms to classify suicidal behavior. We then validated algorithms in an independent validation dataset. From 275,843 women with codes related to pregnancy or delivery, 9331 women screened positive for suicidal behavior by either codified data (N = 196) or NLP (N = 9,145). Using expert-curated features, our algorithm achieved an area under the curve of 0.83. By setting a positive predictive value comparable to that of diagnostic codes related to suicidal behavior (0.71), we obtained a sensitivity of 0.34, specificity of 0.96, and negative predictive value of 0.83. The algorithm identified 1423 pregnant women with suicidal behavior among 9331 women screened positive. Mining unstructured clinical notes using NLP resulted in a 11-fold increase in the number of pregnant women identified with suicidal behavior, as compared to solely reliance on diagnostic codes.

Authors

  • Qiu-Yue Zhong
    Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA. qyzhong@mail.harvard.edu.
  • Leena P Mittal
    Division of Women's Mental Health, Department of Psychiatry, Brigham and Women's Hospital, Boston, MA, USA.
  • Margo D Nathan
    Division of Women's Mental Health, Department of Psychiatry, Brigham and Women's Hospital, Boston, MA, USA.
  • Kara M Brown
    Division of Women's Mental Health, Department of Psychiatry, Brigham and Women's Hospital, Boston, MA, USA.
  • Deborah Knudson González
    Department of Psychiatry and Behavioral Neurosciences, Morsani College of Medicine, University of South Florida, Tampa, FL, USA.
  • Tianrun Cai
    Harvard-MIT Center for Regulatory Science, Harvard Medical School, Boston, MA, United States.
  • Sean Finan
    From Research Information Systems and Computing (V.M.C., V.G., S.M.), Partners Healthcare; Boston Children's Hospital Informatics Program (D.D., S.F., G.S.); Harvard Medical School (D.D., S.Y., A.C., M.A.-E.-B., N.A.S., S.M., S.T.W., R.D.); Department of Medicine (S.Y., S.T.W.), Department of Neurosurgery (A.C., M.A.-E.-B., R.D.), Division of Rheumatology, Immunology and Allergy (N.A.S.), and Channing Division of Network Medicine (S.T.W., R.D.), Brigham and Women's Hospital, Boston, MA; Center for Statistical Science (S.Y.), Tsinghua University, Beijing, China; Department of Neurology (S.M.), Massachusetts General Hospital; and Biostatistics (T.C.), Harvard School of Public Health, Boston, MA.
  • Bizu Gelaye
    Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
  • Paul Avillach
    Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
  • Jordan W Smoller
  • Elizabeth W Karlson
    Division of Rheumatology, Immunology and Allergy, Brigham and Women's Hospital, Boston, MA 02115, USA Harvard Medical School, Boston.
  • Tianxi Cai
    Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States.
  • Michelle A Williams
    Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.