Exploiting Unlabeled Texts with Clustering-based Instance Selection for Medical Relation Classification.

Journal: AMIA ... Annual Symposium proceedings. AMIA Symposium
Published Date:

Abstract

Classifying relations between pairs of medical concepts in clinical texts is a crucial task to acquire empirical evidence relevant to patient care. Due to limited labeled data and extremely unbalanced class distributions, medical relation classification systems struggle to achieve good performance on less common relation types, which capture valuable information that is important to identify. Our research aims to improve relation classification using weakly supervised learning. We present two clustering-based instance selection methods that acquire a diverse and balanced set of additional training instances from unlabeled data. The first method selects one representative instance from each cluster containing only unlabeled data. The second method selects a counterpart for each training instance using clusters containing both labeled and unlabeled data. These new instance selection methods for weakly supervised learning achieve substantial recall gains for the minority relation classes compared to supervised learning, while yielding comparable performance on the majority relation classes.

Authors

  • Youngjun Kim
  • Ellen Riloff
    School of Computing, University of Utah, Salt Lake City, UT.
  • Stéphane M Meystre
    Department of Biomedical Informatics, University of Utah, Salt Lake City, USA.