Enhancing data quality in medical concept normalization through large language models.

Journal: Journal of biomedical informatics
PMID:

Abstract

OBJECTIVE: Medical concept normalization (MCN) aims to map informal medical terms to formal medical concepts, a critical task in building machine learning systems for medical applications. However, most existing studies on MCN primarily focus on models and algorithms, often overlooking the vital role of data quality. This research evaluates MCN performance across varying data quality scenarios and investigates how to leverage these evaluation results to enhance data quality, ultimately improving MCN performance through the use of large language models (LLMs). The effectiveness of the proposed approach is demonstrated through a case study.

Authors

  • Haihua Chen
    The Anuradha & Vikas Sinha Department of Data Science, University of North Texas, Denton, 76203, TX, USA. Electronic address: haihua.chen@unt.edu.
  • Ruochi Li
    Department of Computer Science, North Carolina State University, Raleigh, 27695, NC, USA. Electronic address: rli14@ncsu.edu.
  • Ana Cleveland
    Department of Information Science, University of North Texas, Denton, 76203, TX, USA. Electronic address: ana.cleveland@unt.edu.
  • Junhua Ding
    The Anuradha & Vikas Sinha Department of Data Science, University of North Texas, Denton, 76203, TX, USA. Electronic address: junhua.ding@unt.edu.