Automated identification of fall-related injuries in unstructured clinical notes.

Journal: American journal of epidemiology
PMID:

Abstract

Fall-related injuries (FRIs) are a major cause of hospitalizations among older patients, but identifying them in unstructured clinical notes poses challenges for large-scale research. In this study, we developed and evaluated natural language processing (NLP) models to address this issue. We utilized all available clinical notes from the Mass General Brigham health-care system for 2100 older adults, identifying 154 949 paragraphs of interest through automatic scanning for FRI-related keywords. Two clinical experts directly labeled 5000 paragraphs to generate benchmark-standard labels, while 3689 validated patterns were annotated, indirectly labeling 93 157 paragraphs as validated-standard labels. Five NLP models, including vanilla bidirectional encoder representations from transformers (BERT), the robustly optimized BERT approach (RoBERTa), ClinicalBERT, DistilBERT, and support vector machine (SVM), were trained using 2000 benchmark paragraphs and all validated paragraphs. BERT-based models were trained in 3 stages: masked language modeling, general boolean question-answering, and question-answering for FRIs. For validation, 500 benchmark paragraphs were used, and the remaining 2500 were used for testing. Performance metrics (precision, recall, F1 scores, area under the receiver operating characteristic curve [AUROC], and area under the precision-recall [AUPR] curve) were employed by comparison, with RoBERTa showing the best performance. Precision was 0.90 (95% CI, 0.88-0.91), recall was 0.91 (95% CI, 0.90-0.93), the F1 score was 0.91 (95% CI, 0.89-0.92), and the AUROC and AUPR curves were [both??] 0.96 (95% CI, 0.95-0.97). These NLP models accurately identify FRIs from unstructured clinical notes, potentially enhancing clinical-notes-based research efficiency.

Authors

  • Wendong Ge
    Brigham and Women's Hospital.
  • Lilian M Godeiro Coelho
    Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, United States.
  • Maria A Donahue
    Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, United States.
  • Hunter J Rice
    From the Department of Neurology (W.G., H.J.R., I.S.S., L.M.), Massachusetts General Hospital, Boston; Department of Neurology (M.B.W.), Beth Israel Deaconess Medical Center, Boston, MA; Information Technology Division (A.L.W.), Cleveland Clinic, OH; Department of Neurology (L.K.J.), Mayo Clinic, Rochester, MN; and Department of Neurology (L.M.), Harvard Medical School, Boston, MA.
  • Deborah Blacker
    Department of Epidemiology, Harvard T. H. Chan School of Public Health and Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts.
  • John Hsu
    Department of Health Care Policy, Harvard Medical School, Boston, MA 02115, United States.
  • Joseph P Newhouse
    Department of Health Care Policy, Harvard Medical School, Boston, MA 02115, United States.
  • Sonia Hernandez-Diaz
    Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts.
  • Sebastien Haneuse
    Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, United States.
  • Brandon Westover
    Harvard Medical School, Boston, Massachusetts, USA.
  • Lidia M V R Moura
    Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, United States.