Open-world semi-supervised relation extraction.
Journal:
Neural networks : the official journal of the International Neural Network Society
PMID:
39987714
Abstract
Semi-supervised Relation Extraction methods play an important role in extracting relationships from unstructured text, which can leverage both labeled and unlabeled data to improve extraction accuracy. However, these methods are grounded under the closed-world assumption, in which the relationship types of labeled and unlabeled data belong to the same closed set, that are not applicable to real-world scenarios that involve novel relationships. To address this issue, this paper proposes an open-world semi-supervised relation extraction task and a novel method, Seen relation Identification and Novel relation Discovery (SIND), to extract both seen and novel relations simultaneously. Specifically, SIND develops a contrastive learning strategy to improve the semantic representation of relations and incorporates a cluster-aware method for discovering novel relations by leveraging the pairwise similarity between samples in the feature space. Additionally, SIND utilizes the maximum entropy theory as the prior distribution to address the learning pace imbalance problem caused by the absence of labeled data for novel classes. Experimental results on three widely used benchmark datasets demonstrate that SIND achieves significant improvements over baseline models. This study provides an exploration to address the challenge of discovering relationships within unannotated data and presents a reference approach for various natural language processing tasks, such as text classification and named entity recognition, in open-world scenarios. The datasets and source code of this work are available at https://github.com/a-home-bird/SIND.