Development of a Human Evaluation Framework and Correlation with Automated Metrics for Natural Language Generation of Medical Diagnoses.

Journal: AMIA ... Annual Symposium proceedings. AMIA Symposium
Published Date:

Abstract

In the evolving landscape of clinical Natural Language Generation (NLG), assessing abstractive text quality remains challenging, as existing methods often overlook generative task complexities. This work aimed to examine the current state of automated evaluation metrics in NLG in healthcare. To have a robust and well-validated baseline with which to examine the alignment of these metrics, we created a comprehensive human evaluation framework. Employing ChatGPT-3.5-turbo generative output, we correlated human judgments with each metric. None of the metrics demonstrated high alignment; however, the SapBERT score-a Unified Medical Language System (UMLS)- showed the best results. This underscores the importance of incorporating domain-specific knowledge into evaluation efforts. Our work reveals the deficiency in quality evaluations for generated text and introduces our comprehensive human evaluation framework as a baseline. Future efforts should prioritize integrating medical knowledge databases to enhance the alignment of automated metrics, particularly focusing on refining the SapBERT score for improved assessments.

Authors

  • Emma Croxford
    Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI 53792, United States.
  • Yanjun Gao
    Department of Biomedical Informatics, University of Colorado-Anschutz Medical, Aurora, CO 80045, United States.
  • Brian Patterson
    UW Health, Madison, WI 53726, United States.
  • Daniel To
    Health Sciences Division, Burn and Shock Trauma Research Institute, Stritch School of Medicine, Loyola University, Maywood, Illinois, USA.
  • Samuel Tesch
    School of Medicine and Public Health, University of Wisconsin, Madison, Wisconsin, USA.
  • Dmitriy Dligach
    Department of Public Health Sciences, Stritch School of Medicine, Loyola University Chicago, Maywood, IL.
  • Anoop Mayampurath
    Department of Medicine, University of Wisconsin-Madison, Madison, WI, United States.
  • Matthew M Churpek
    Department of Medicine, University of Wisconsin-Madison, Madison, WI, United States.
  • Majid Afshar
    Loyola University Chicago, Chicago, IL.