Augmenting biomedical named entity recognition with general-domain resources.

Journal: Journal of biomedical informatics
Published Date:

Abstract

OBJECTIVE: Training a neural network-based biomedical named entity recognition (BioNER) model usually requires extensive and costly human annotations. While several studies have employed multi-task learning with multiple BioNER datasets to reduce human effort, this approach does not consistently yield performance improvements and may introduce label ambiguity in different biomedical corpora. We aim to tackle those challenges through transfer learning from easily accessible resources with fewer concept overlaps with biomedical datasets.

Authors

  • Yu Yin
  • Hyunjae Kim
    Department of Computer Science and Engineering, Korea University, Seoul, South Korea.
  • Xiao Xiao
    George Washington University.
  • Chih Hsuan Wei
    National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD-20894, United States.
  • Jaewoo Kang
    Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea.
  • Zhiyong Lu
    National Center for Biotechnology Information, Bethesda, MD 20894 USA.
  • Hua Xu
    Department of Urology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
  • Meng Fang
    Ministry of Education Key Laboratory of Bioinformatics, Tsinghua University, Beijing, China.
  • Qingyu Chen
    Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA.