EGFI: drug-drug interaction extraction and generation with fusion of enriched entity and sentence information.

Journal: Briefings in bioinformatics
Published Date:

Abstract

MOTIVATION: The rapid growth in literature accumulates diverse and yet comprehensive biomedical knowledge hidden to be mined such as drug interactions. However, it is difficult to extract the heterogeneous knowledge to retrieve or even discover the latest and novel knowledge in an efficient manner. To address such a problem, we propose EGFI for extracting and consolidating drug interactions from large-scale medical literature text data. Specifically, EGFI consists of two parts: classification and generation. In the classification part, EGFI encompasses the language model BioBERT which has been comprehensively pretrained on biomedical corpus. In particular, we propose the multihead self-attention mechanism and packed BiGRU to fuse multiple semantic information for rigorous context modeling. In the generation part, EGFI utilizes another pretrained language model BioGPT-2 where the generation sentences are selected based on filtering rules.

Authors

  • Lei Huang
    School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China.
  • Jiecong Lin
    Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR.
  • Xiangtao Li
  • Linqi Song
    Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR.
  • Zetian Zheng
    Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR.
  • Ka-Chun Wong