Distantly Supervised Biomedical Relation Extraction via Negative Learning and Noisy Student Self-Training.

Journal: IEEE/ACM transactions on computational biology and bioinformatics
Published Date:

Abstract

Biomedical relation extraction aims to identify underlying relationships among entities, such as gene associations and drug interactions, within biomedical texts. Despite advancements in relation extraction in general knowledge domains, the scarcity of labeled training data remains a significant challenge in the biomedical field. This paper provides a novel approach for biomedical relation extraction that leverages a noisy student self-training strategy combined with negative learning. This method addresses the challenge of data insufficiency by utilizing distantly supervised data to generate high-quality labeled samples. Negative learning, as opposed to traditional positive learning, offers a more robust mechanism to discern and relabel noisy samples, preventing model overfitting. The integration of these techniques ensures enhanced noise reduction and relabeling capabilities, leading to improved performance even with noisy datasets. Experimental results demonstrate the effectiveness of the proposed framework in mitigating the impact of noisy data and outperforming existing benchmarks.

Authors

  • Yuanfei Dai
    College of Mathematics and Computer Sciences, Fuzhou University, Fujian, China.
  • Bin Zhang
    Department of Psychiatry, Sleep Medicine Center, Nanfang Hospital, Southern Medical University, Guangzhou, China.
  • Shiping Wang
    College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China; Guangdong Provincial Key Laboratory of Big Data Computing, The Chinese University of Hong Kong, Shenzhen 518172, China. Electronic address: shipingwangphd@163.com.