CoRTEx: contrastive learning for representing terms via explanations with applications on constructing biomedical knowledge graphs.

Journal: Journal of the American Medical Informatics Association : JAMIA
PMID:

Abstract

OBJECTIVES: Biomedical Knowledge Graphs play a pivotal role in various biomedical research domains. Concurrently, term clustering emerges as a crucial step in constructing these knowledge graphs, aiming to identify synonymous terms. Due to a lack of knowledge, previous contrastive learning models trained with Unified Medical Language System (UMLS) synonyms struggle at clustering difficult terms and do not generalize well beyond UMLS terms. In this work, we leverage the world knowledge from large language models (LLMs) and propose Contrastive Learning for Representing Terms via Explanations (CoRTEx) to enhance term representation and significantly improves term clustering.

Authors

  • Huaiyuan Ying
    Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, 100084, China.
  • Zhengyun Zhao
    Center for Statistical Science, Tsinghua University, Beijing, China; Department of Industrial Engineering, Tsinghua University, Beijing, China.
  • Yang Zhao
    The George Institute for Global Health, Faculty of Medicine, University of New South Wales, Sydney, NSW, Australia.
  • Sihang Zeng
    Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA 98195, United States.
  • Sheng Yu
    Medical College, Guangxi University of Science and Technology, Liuzhou, Guangxi, 545005, China.