A study on pharmaceutical text relationship extraction based on heterogeneous graph neural networks.

Journal: Mathematical biosciences and engineering : MBE
PMID:

Abstract

Effective information extraction of pharmaceutical texts is of great significance for clinical research. The ancient Chinese medicine text has streamlined sentences and complex semantic relationships, and the textual relationships may exist between heterogeneous entities. The current mainstream relationship extraction model does not take into account the associations between entities and relationships when extracting, resulting in insufficient semantic information to form an effective structured representation. In this paper, we propose a heterogeneous graph neural network relationship extraction model adapted to traditional Chinese medicine (TCM) text. First, the given sentence and predefined relationships are embedded by bidirectional encoder representation from transformers (BERT fine-tuned) word embedding as model input. Second, a heterogeneous graph network is constructed to associate words, phrases, and relationship nodes to obtain the hidden layer representation. Then, in the decoding stage, two-stage subject-object entity identification method is adopted, and the identifier adopts a binary classifier to locate the start and end positions of the TCM entities, identifying all the subject-object entities in the sentence, and finally forming the TCM entity relationship group. Through the experiments on the TCM relationship extraction dataset, the results show that the precision value of the heterogeneous graph neural network embedded with BERT is 86.99% and the F1 value reaches 87.40%, which is improved by 8.83% and 10.21% compared with the relationship extraction models CNN, Bert-CNN, and Graph LSTM.

Authors

  • Shuilong Zou
    Nanchang Institute of science & Technology, Nanchang 330004, China.
  • Zhaoyang Liu
    Northwest Women and Children Hospital, Xi'an, China.
  • Kaiqi Wang
    School of Computer, Jiangxi University of Chinese Medicine, Nanchang 330004, China.
  • Jun Cao
    Department of Psychiatry, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China.
  • Shixiong Liu
    Nanchang Institute of science & Technology, Nanchang 330004, China.
  • Wangping Xiong
    School of Computer, Jiangxi University of Chinese Medicine, Nanchang 330004, China.
  • Shaoyi Li
    School of Artificial Intelligence, Nanchang Institute of Science and Technology, Nanchang 330108, China.