Concept embedding to measure semantic relatedness for biomedical information ontologies.

Journal: Journal of biomedical informatics
Published Date:

Abstract

There have been many attempts to identify relationships among concepts corresponding to terms from biomedical information ontologies such as the Unified Medical Language System (UMLS). In particular, vector representation of such concepts using information from UMLS definition texts is widely used to measure the relatedness between two biological concepts. However, conventional relatedness measures have a limited range of applicable word coverage, which limits the performance of these models. In this paper, we propose a concept-embedding model of a UMLS semantic relatedness measure to overcome the limitations of earlier models. We obtained context texts of biological concepts that are not defined in UMLS by utilizing Wikipedia as an external knowledgebase. Concept vector representations were then derived from the context texts of the biological concepts. The degree of relatedness between two concepts was defined as the cosine similarity between corresponding concept vectors. As a result, we validated that our method provides higher coverage and better performance than the conventional method.

Authors

  • Junseok Park
    Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea.
  • Kwangmin Kim
    Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea.
  • Woochang Hwang
    Milner Therapeutics Institute University of Cambridge, Cambridge CB2 1TN, UK.
  • Doheon Lee