Autonomous International Classification of Diseases Coding Using Pretrained Language Models and Advanced Prompt Learning Techniques: Evaluation of an Automated Analysis System Using Medical Text.

Journal: JMIR medical informatics
PMID:

Abstract

BACKGROUND: Machine learning models can reduce the burden on doctors by converting medical records into International Classification of Diseases (ICD) codes in real time, thereby enhancing the efficiency of diagnosis and treatment. However, it faces challenges such as small datasets, diverse writing styles, unstructured records, and the need for semimanual preprocessing. Existing approaches, such as naive Bayes, Word2Vec, and convolutional neural networks, have limitations in handling missing values and understanding the context of medical texts, leading to a high error rate. We developed a fully automated pipeline based on the Key-bidirectional encoder representations from transformers (BERT) approach and large-scale medical records for continued pretraining, which effectively converts long free text into standard ICD codes. By adjusting parameter settings, such as mixed templates and soft verbalizers, the model can adapt flexibly to different requirements, enabling task-specific prompt learning.

Authors

  • Yan Zhuang
    Medical Psychology Department, Taiyuan Mental Hospital, Taiyuan, China.
  • Junyan Zhang
  • Xiuxing Li
    Key Laboratory of Intelligent Information Processing Institute of Computing Technology, Chinese Academy of Sciences (ICT/CAS), Beijing, China.
  • Chao Liu
    Anti-Drug Technology Center of Guangdong Province, National Anti-Drug Laboratory Guangdong Regional Center, Guangzhou 510230, China.
  • Yue Yu
    Department of Mathematics, Lehigh University, Bethlehem, PA, USA.
  • Wei Dong
    Department of Cardiology, Chinese PLA General Hospital, Beijing, China.
  • Kunlun He
    Beijing Key Laboratory of Precision Medicine for Chronic Heart Failure, Chinese PLA General Hospital, Beijing, China.