Identifying Diabetes in Clinical Notes in Hebrew: A Novel Text Classification Approach Based on Word Embedding.

Journal: Studies in health technology and informatics
Published Date:

Abstract

NimbleMiner is a word embedding-based, language-agnostic natural language processing system for clinical text classification. Previously, NimbleMiner was applied in English and this study applied NimbleMiner on a large sample of inpatient clinical notes in Hebrew to identify instances of diabetes mellitus. The study data included 521,278 clinical notes (one admission and one discharge note per patient) for 268,664 hospital admissions to medical-surgical units of a large hospital in Israel. NimbleMiner achieved overall good performance (F-score =.94) when tested on a gold standard human annotated dataset of 800 clinical notes. We found 15% more patients with diabetes mentioned in the clinical notes compared with diagnoses data. Our findings about underreporting of diabetes in the coded diagnoses data highlight the urgent need for tools and algorithms that will help busy providers identify a range of useful information, like having a diabetes.

Authors

  • Maxim Topaz
    Division of General Internal Medicine and Primary Care, Brigham & Women's Hospital, Harvard Medical School, Boston, MA, USA.
  • Ludmila Murga
    Cheryl Spencer Department of Nursing, University of Haifa, Haifa, Israel.
  • Chagai Grossman
    Sheba Medical Center, Tel Hashomer, Israel.
  • Daniella Daliyot
    Sheba Medical Center, Tel Hashomer, Israel.
  • Shlomit Jacobson
    Sheba Medical Center, Tel Hashomer, Israel.
  • Noa Rozendorn
    Sheba Medical Center, Tel Hashomer, Israel.
  • Eyal Zimlichman
    Sheba Medical Center, Tel Hashomer, Israel.
  • Nadav Furie
    Sheba Medical Center, Tel Hashomer, Israel.