PLRTE: Progressive learning for biomedical relation triplet extraction using large language models.

Journal: Journal of biomedical informatics
Published Date:

Abstract

Document-level relation triplet extraction is crucial in biomedical text mining, aiding in drug discovery and the construction of biomedical knowledge graphs. Current language models face challenges in generalizing to unseen datasets and relation types in biomedical relation triplet extraction, which limits their effectiveness in these crucial tasks. To address this challenge, our study optimizes models from two critical dimensions: data-task relevance and granularity of relations, aiming to enhance their generalization capabilities significantly. We introduce a novel progressive learning strategy to obtain the PLRTE model. This strategy not only enhances the model's capability to comprehend diverse relation types in the biomedical domain but also implements a structured four-level progressive learning process through semantic relation augmentation, compositional instruction, and dual-axis level learning. Our experiments on the DDI and BC5CDR document-level biomedical relation triplet datasets demonstrate a significant performance improvement of 5% to 20% over the current state-of-the-art baselines. Furthermore, our model exhibits exceptional generalization capabilities on the unseen Chemprot and GDA datasets, further validating the effectiveness of optimizing data-task association and relation granularity for enhancing model generalizability.

Authors

  • Yi-Kai Zheng
    School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510000, China; Guangzhou National Laboratory, No. 9 XingDaoHuanBei Road, Guangzhou International Bio Island, Guangzhou 510005, China.
  • Bi Zeng
    School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510000, China. Electronic address: zb9215@gdut.edu.cn.
  • Yi-Chun Feng
    Guangzhou National Laboratory, No. 9 XingDaoHuanBei Road, Guangzhou International Bio Island, Guangzhou 510005, China; Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China.
  • Lu Zhou
    School of Environment, Tsinghua University Beijing 100084 P. R. China zhoulu@mail.tsinghua.edu.cn.
  • Yi-Xue Li
    Guangzhou National Laboratory, No. 9 XingDaoHuanBei Road, Guangzhou International Bio Island, Guangzhou 510005, China; Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China. Electronic address: yxli@sibs.ac.cn.