Enhancing data quality in medical concept normalization through large language models.

Journal: Journal of biomedical informatics

PMID: 40180205

Abstract

OBJECTIVE: Medical concept normalization (MCN) aims to map informal medical terms to formal medical concepts, a critical task in building machine learning systems for medical applications. However, most existing studies on MCN primarily focus on models and algorithms, often overlooking the vital role of data quality. This research evaluates MCN performance across varying data quality scenarios and investigates how to leverage these evaluation results to enhance data quality, ultimately improving MCN performance through the use of large language models (LLMs). The effectiveness of the proposed approach is demonstrated through a case study.

Authors

Haihua Chen

The Anuradha & Vikas Sinha Department of Data Science, University of North Texas, Denton, 76203, TX, USA. Electronic address: haihua.chen@unt.edu.
Ruochi Li

Department of Computer Science, North Carolina State University, Raleigh, 27695, NC, USA. Electronic address: rli14@ncsu.edu.
Ana Cleveland

Department of Information Science, University of North Texas, Denton, 76203, TX, USA. Electronic address: ana.cleveland@unt.edu.
Junhua Ding

The Anuradha & Vikas Sinha Department of Data Science, University of North Texas, Denton, 76203, TX, USA. Electronic address: junhua.ding@unt.edu.

Keywords

Algorithms Data Accuracy Humans Language Large Language Models Machine Learning Medical Informatics Natural Language Processing

External Resources

View on PubMed Access via DOI PubMed (40180205)

Enhancing data quality in medical concept normalization through large language models.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals