ICD code mapping model based on clinical text tree structure.

Journal: Artificial intelligence in medicine
Published Date:

Abstract

With the rapid development and progress of big data and artificial intelligence technology, the ICD coding problem of electronic medical records has been effectively solved. The deep learning method, which replaces the manual coding method, has improved the quality and efficiency of coding. However, it also faces some challenges, such as poor and fuzzy semantic representation of clinical record text and failure to consider the structural characteristics of clinical records. To address these problems, our study proposed an ICD Coding model (TRansformer and TRee-lstm for ICD Coding, TRIC), which enables adequate automatic ICD encoding of unstructured clinical records. In this model, the structure and features of clinical records are extracted by the constituency tree model and the transformer based model respectively, and the Tree-lstm model is used to enrich the features. Then bioBERT pre-training model is used to highlight the role of key ICD coding and improve its matching performance. Finally, it is classified by a fully connected neural network classifier to realize the many-to-many mapping between clinical records and ICD codes. On the widely used MIMIC-III full data set and sample data set, the TRIC model is compared with 12 benchmark models. The best results of 0.586, 0.109, 0.989, 0.937 and 0.758 were obtained for MiF, MaF, MiAUC, MaAUC and P@8, respectively, which verified that the TRIC model can effectively improve the quality of ICD automatic coding.

Authors

  • Jingjin Xue
    School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China.
  • Pengli Lu
    School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China. Electronic address: lupengli88@163.com.