A Unified Model Using Distantly Supervised Data and Cross-Domain Data in NER.

Journal: Computational intelligence and neuroscience
Published Date:

Abstract

Named entity recognition (NER) systems are often realized by supervised methods that require large hand-annotated data. When the hand-annotated data is limited, distantly supervised (DS) data and cross-domain (CD) data are usually used separately to improve the performance. The distantly supervised data can provide in-domain dictionary information, and the hand-annotated cross-domain information can be provided by cross-domain data. These two types of information are complemental. However, there are two problems required to be solved before using directly. First, the distantly supervised data may contain a lot of noise. Second, directly using cross-domain data may degrade performance due to the distribution mismatching problem. In this paper, we propose a unified model named PARE (PArtial learning and REinforcement learning). The PARE model can simultaneously use distantly supervised data and cross-domain data as external data. The model uses the partial learning method with a new label strategy to better handle the noise in distantly supervised data. The reinforcement learning method is used to alleviate the distribution mismatching problem in cross-domain data. Experiments in three datasets show that our model outperforms other baseline models. Besides, our model can be used in the situation where no hand-annotated in-domain data is provided.

Authors

  • Yun Hu
    Department of Radiology, The Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, 26 Shengli Avenue, Jiangan, Wuhan, 430014, Hubei, China.
  • Hao He
    School of Aerospace Engineering , Xiamen University , Xiamen 361005 , P. R. China.
  • Zhengfei Chen
    Shenzhen Power Supply Bureau Co., Ltd., Shenzhen 518001, China.
  • Qingmeng Zhu
    Institute of Software, Chinese Academy of Sciences, Haidian, Beijing 100190, China.
  • Changwen Zheng
    Institute of Software, Chinese Academy of Sciences, Haidian, Beijing 100190, China.