Active learning for ontological event extraction incorporating named entity recognition and unknown word handling.

Journal: Journal of biomedical semantics
Published Date:

Abstract

BACKGROUND: Biomedical text mining may target various kinds of valuable information embedded in the literature, but a critical obstacle to the extension of the mining targets is the cost of manual construction of labeled data, which are required for state-of-the-art supervised learning systems. Active learning is to choose the most informative documents for the supervised learning in order to reduce the amount of required manual annotations. Previous works of active learning, however, focused on the tasks of entity recognition and protein-protein interactions, but not on event extraction tasks for multiple event types. They also did not consider the evidence of event participants, which might be a clue for the presence of events in unlabeled documents. Moreover, the confidence scores of events produced by event extraction systems are not reliable for ranking documents in terms of informativity for supervised learning. We here propose a novel committee-based active learning method that supports multi-event extraction tasks and employs a new statistical method for informativity estimation instead of using the confidence scores from event extraction systems.

Authors

  • Xu Han
  • Jung-Jae Kim
  • Chee Keong Kwoh
    School of Computer Science and Engineering,  Nanyang  Technological  University,  50  Nanyang  Avenue,  639798, Singapore.