A study of active learning methods for named entity recognition in clinical text.

Journal: Journal of biomedical informatics
Published Date:

Abstract

OBJECTIVES: Named entity recognition (NER), a sequential labeling task, is one of the fundamental tasks for building clinical natural language processing (NLP) systems. Machine learning (ML) based approaches can achieve good performance, but they often require large amounts of annotated samples, which are expensive to build due to the requirement of domain experts in annotation. Active learning (AL), a sample selection approach integrated with supervised ML, aims to minimize the annotation cost while maximizing the performance of ML-based models. In this study, our goal was to develop and evaluate both existing and new AL methods for a clinical NER task to identify concepts of medical problems, treatments, and lab tests from the clinical notes.

Authors

  • Yukun Chen
    Department of Biomedical Informatics, Vanderbilt University, School of Medicine, Nashville, TN, USA.
  • Thomas A Lasko
    Vanderbilt University School of Medicine, Nashville, TN.
  • Qiaozhu Mei
    University of Michigan, Ann Arbor, MI.
  • Joshua C Denny
    Vanderbilt University, Nashville, TN.
  • Hua Xu
    Department of Urology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.