Benchmarking Transformer-Based Models for Identifying Social Determinants of Health in Clinical Notes.

Journal: Proceedings. IEEE International Conference on Healthcare Informatics
Published Date:

Abstract

Electronic health records (EHR) have been widely used in building machine learning models for health outcomes prediction. However, many EHR-based models are inherently biased due to lack of risk factors on social determinants of health (SDoH), which are responsible for up to 40% preventive deaths. As SDoH information is often captured in clinical notes, recent efforts have been made to extract such information from notes with natural language processing and append it to other structured data. In this work, we benchmark 7 pre-trained transformer-based models, including BERT, ALBERT, BioBERT, BioClinicalBERT, RoBERTa, ELECTRA, and RoBERTa-MIMIC-Trial, for recognizing SDoH terms using a previously annotated corpus of MIMIC-III clinical notes. Our study shows that BioClinicalBERT model performs best on F-1 scores (0.911, 0.923) under both strict and relaxed criteria. This work shows the promise of using transformer-based models for recognizing SDoH information from clinical notes.

Authors

  • Xiaoyu Wang
    Department of Statistics Florida State University Tallahassee, FL, USA.
  • Dipankar Gupta
    College of Medicine University of Florida Gainesville, FL, Florida.
  • Michael Killian
    College of Social Work Florida State University Tallahassee, FL, USA.
  • Zhe He
    School of Information, Florida State University, Tallahassee, FL, USA.

Keywords

No keywords available for this article.