Self-training in significance space of support vectors for imbalanced biomedical event data.

Journal: BMC bioinformatics
Published Date:

Abstract

BACKGROUND: Pairwise relationships extracted from biomedical literature are insufficient in formulating biomolecular interactions. Extraction of complex relations (namely, biomedical events) has become the main focus of the text-mining community. However, there are two critical issues that are seldom dealt with by existing systems. First, an annotated corpus for training a prediction model is highly imbalanced. Second, supervised models trained on only a single annotated corpus can limit system performance. Fortunately, there is a large pool of unlabeled data containing much of the domain background that one can exploit.

Authors

  • Tsendsuren Munkhdalai
  • Oyun-Erdene Namsrai
  • Keun Ryu