Open-world semi-supervised relation extraction.

Journal: Neural networks : the official journal of the International Neural Network Society
PMID:

Abstract

Semi-supervised Relation Extraction methods play an important role in extracting relationships from unstructured text, which can leverage both labeled and unlabeled data to improve extraction accuracy. However, these methods are grounded under the closed-world assumption, in which the relationship types of labeled and unlabeled data belong to the same closed set, that are not applicable to real-world scenarios that involve novel relationships. To address this issue, this paper proposes an open-world semi-supervised relation extraction task and a novel method, Seen relation Identification and Novel relation Discovery (SIND), to extract both seen and novel relations simultaneously. Specifically, SIND develops a contrastive learning strategy to improve the semantic representation of relations and incorporates a cluster-aware method for discovering novel relations by leveraging the pairwise similarity between samples in the feature space. Additionally, SIND utilizes the maximum entropy theory as the prior distribution to address the learning pace imbalance problem caused by the absence of labeled data for novel classes. Experimental results on three widely used benchmark datasets demonstrate that SIND achieves significant improvements over baseline models. This study provides an exploration to address the challenge of discovering relationships within unannotated data and presents a reference approach for various natural language processing tasks, such as text classification and named entity recognition, in open-world scenarios. The datasets and source code of this work are available at https://github.com/a-home-bird/SIND.

Authors

  • Diange Zhou
    Arthritis Clinical and Research Center, Peking University People's Hospital, Peking University, Beijing, China.
  • Yilin Duan
    School of Computer Science, China University of Geosciences, Wuhan, 430078, China. Electronic address: duanyl@cug.edu.cn.
  • Shengwen Li
    School of Geography and Information Engineering, China University of Geosciences, Wuhan 430074, China. swli@cug.edu.cn.
  • Hong Yao
    School of Computer Science, China University of Geosciences, Wuhan 430074, China. yaohong@cug.edu.cn.