Using Natural Language Processing to Extract and Classify Symptoms Among Patients with Thyroid Dysfunction.

Journal: Studies in health technology and informatics
PMID:

Abstract

In the United States, more than 12% of the population will experience thyroid dysfunction. Patient symptoms often reported with thyroid dysfunction include fatigue and weight change. However, little is understood about the relationship between these symptoms documented in the outpatient setting and ordering patterns for thyroid testing among various patient groups by age and sex. We developed a natural language processing and deep learning pipeline to identify patient-reported outcomes of weight change and fatigue among patients with a thyroid stimulating hormone test. We built upon prior works by comparing 5 open-source, Bidirectional Encoder Representations from Transformers (BERT) to determine which models could accurately identify these symptoms from clinical texts. For both fatigue (f) and weight change (wc), Bio_ClinicalBERT achieved the highest F1-score (f: 0.900; wc: 0.906) compared BERT (f: 0.899; wc: 0.890), DistilBERT (f: 0.852; wc: 0.912), Biomedical RoBERTa (f: 0.864; wc: 0.904), and PubMedBERT (f: 0.882; wc: 0.892).

Authors

  • Sy Hwang
    Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania.
  • Sujatha Reddy
    University of Pennsylvania, Philadelphia, PA, USA.
  • Katherine Wainwright
    University of Pennsylvania, Philadelphia, PA, USA.
  • Emily Schriver
    Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania.
  • Anne Cappola
    University of Pennsylvania, Philadelphia, PA, USA.
  • Danielle Mowery
    Department of Biomedical Informatics, University of Utah, 421 Wakara Way, Salt Lake City, 84108 UT United States.