Identifying synonymy between relational phrases using word embeddings.

Journal: Journal of biomedical informatics
Published Date:

Abstract

Many text mining applications in the biomedical domain benefit from automatic clustering of relational phrases into synonymous groups, since it alleviates the problem of spurious mismatches caused by the diversity of natural language expressions. Most of the previous work that has addressed this task of synonymy resolution uses similarity metrics between relational phrases based on textual strings or dependency paths, which, for the most part, ignore the context around the relations. To overcome this shortcoming, we employ a word embedding technique to encode relational phrases. We then apply the k-means algorithm on top of the distributional representations to cluster the phrases. Our experimental results show that this approach outperforms state-of-the-art statistical models including latent Dirichlet allocation and Markov logic networks.

Authors

  • Nhung T H Nguyen
    University of Science, Vietnam National University, Ho Chi Minh City, 227 Nguyen Van Cu St., Ward 4, Dist. 5, Ho Chi Minh City, Viet Nam; Japan Advanced Institute of Science and Technology, 1-8 Asahidai, Nomi-shi, Ishikawa 923-1292, Japan. Electronic address: nthnhung@jaist.ac.jp.
  • Makoto Miwa
  • Yoshimasa Tsuruoka
    The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan. Electronic address: tsuruoka@logos.t.u-tokyo.ac.jp.
  • Satoshi Tojo
    Japan Advanced Institute of Science and Technology, 1-8 Asahidai, Nomi-shi, Ishikawa 923-1292, Japan. Electronic address: tojo@jaist.ac.jp.