LCDL: Classification of ICD codes based on disease label co-occurrence dependency and LongFormer with medical knowledge.

Journal: Artificial intelligence in medicine
PMID:

Abstract

Medical coding involves assigning codes to clinical free-text documents, specifically medical records that average over 3,000 markers, in order to track patient diagnoses and treatments. This is typically accomplished through manual assignments by healthcare professionals. To improve efficiency and accuracy while reducing the workload on these professionals, researchers have employed a multi-label classification approach. Since the long-tail phenomenon impacts tens of thousands of ICD codes, whereby only a few codes (representative of common diseases) are frequently assigned, while the majority of codes (representative of rare diseases) are infrequently assigned, this paper presents an LCDL model that addresses the challenge at hand by examining the LongFormer pre-trained language model and the disease label co-occurrence map. To enhance the performance of automated medical coding in the biomedical domain, hierarchies with medical knowledge, synonyms and abbreviations are introduced, improving the medical knowledge representation. Test evaluations are extensively conducted on the benchmark dataset MIMIC-III, and obtained the competitive performance compared to the previous state-of-the-art methods.

Authors

  • Yumeng Yang
    School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China. Electronic address: yumeng.yang@dlut.edu.cn.
  • Hongfei Lin
  • Zhihao Yang
    College of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.
  • Yijia Zhang
    School of Computer Science and Technology, Dalian University of Technology, Dalian, China.
  • Di Zhao
  • Ling Luo
    Department of Epidemiology and Medical Statistics School of Public Health, Guangdong Medical University, Dongguan, Guangdong, China.