A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records.

Journal: BMC medical informatics and decision making
Published Date:

Abstract

BACKGROUND: The Named Entity Recognition (NER) task as a key step in the extraction of health information, has encountered many challenges in Chinese Electronic Medical Records (EMRs). Firstly, the casual use of Chinese abbreviations and doctors' personal style may result in multiple expressions of the same entity, and we lack a common Chinese medical dictionary to perform accurate entity extraction. Secondly, the electronic medical record contains entities from a variety of categories of entities, and the length of those entities in different categories varies greatly, which increases the difficult in the extraction for the Chinese NER. Therefore, the entity boundary detection becomes the key to perform accurate entity extraction of Chinese EMRs, and we need to develop a model that supports multiple length entity recognition without relying on any medical dictionary.

Authors

  • Xiaoling Cai
    Communication & Computer Network Lab of Guangdong, School of Computer Science and Engineering, South China University of Technology, Guangzhou, China.
  • Shoubin Dong
    School of Computer Science and Engineering, South China University of Technology, Guangzhou, China.
  • Jinlong Hu
    School of Computer Science and Engineering, South China University of Technology, Guangzhou, China.