LCDL: Classification of ICD codes based on disease label co-occurrence dependency and LongFormer with medical knowledge.

Journal: Artificial intelligence in medicine

PMID: 39667117

Abstract

Medical coding involves assigning codes to clinical free-text documents, specifically medical records that average over 3,000 markers, in order to track patient diagnoses and treatments. This is typically accomplished through manual assignments by healthcare professionals. To improve efficiency and accuracy while reducing the workload on these professionals, researchers have employed a multi-label classification approach. Since the long-tail phenomenon impacts tens of thousands of ICD codes, whereby only a few codes (representative of common diseases) are frequently assigned, while the majority of codes (representative of rare diseases) are infrequently assigned, this paper presents an LCDL model that addresses the challenge at hand by examining the LongFormer pre-trained language model and the disease label co-occurrence map. To enhance the performance of automated medical coding in the biomedical domain, hierarchies with medical knowledge, synonyms and abbreviations are introduced, improving the medical knowledge representation. Test evaluations are extensively conducted on the benchmark dataset MIMIC-III, and obtained the competitive performance compared to the previous state-of-the-art methods.

Authors

Yumeng Yang

School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China. Electronic address: yumeng.yang@dlut.edu.cn.
Hongfei Lin
Zhihao Yang

College of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
Yijia Zhang

School of Computer Science and Technology, Dalian University of Technology, Dalian, China.
Di Zhao
Ling Luo

Department of Epidemiology and Medical Statistics School of Public Health, Guangdong Medical University, Dongguan, Guangdong, China.

Keywords

Clinical Coding Electronic Health Records Humans International Classification of Diseases Natural Language Processing

External Resources

View on PubMed Access via DOI PubMed (39667117)

LCDL: Classification of ICD codes based on disease label co-occurrence dependency and LongFormer with medical knowledge.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals