Online biomedical named entities recognition by data and knowledge-driven model.

Journal: Artificial intelligence in medicine
Published Date:

Abstract

Named entity recognition (NER) is an important task for the natural language processing of biomedical text. Currently, most NER studies standardized biomedical text, but NER for unstandardized biomedical text draws less attention from researchers. Named entities in online biomedical text exist with errors and polymorphisms, which negatively impact NER models' performance and impede support from knowledge representation methods. In this paper, we propose a neural network method that can effectively recognize entities in unstandardized online medical/health text. We introduce a new pre-training scheme that uses large-scale online question-answering pairs to enhance transformers' model capacity on online biomedical text. Moreover, we supply models with knowledge representations from a knowledge base called multi-channel knowledge labels, and this method overcomes the restriction from languages, like Chinese, that require word segmentation tools to represent knowledge. Our model outperforms other baseline methods significantly in experiments on a dataset for Chinese online medical entity recognition and achieves state-of-the-art results.

Authors

  • Lulu Cao
    Department of Rheumatology and Immunology, Peking University People's Hospital, 100044, China.
  • Chaochen Wu
    National Laboratory of Pattern Recognition, Institute of Automation, CAS, 95 Zhongguancun East Road, Beijing 100190, China.
  • Guan Luo
    National Laboratory of Pattern Recognition, Institute of Automation, CAS, 95 Zhongguancun East Road, Beijing 100190, China. Electronic address: gluo@nlpr.ia.ac.cn.
  • Chao Guo
    Department of Cardiology, Fuwai Hospital CAMS and PUMC, Beijing 100037, China.
  • Anni Zheng
    National Laboratory of Pattern Recognition, Institute of Automation, CAS, 95 Zhongguancun East Road, Beijing 100190, China.