Semantic projection recovers rich human knowledge of multiple object features from word embeddings.

Journal: Nature human behaviour
Published Date:

Abstract

How is knowledge about word meaning represented in the mental lexicon? Current computational models infer word meanings from lexical co-occurrence patterns. They learn to represent words as vectors in a multidimensional space, wherein words that are used in more similar linguistic contexts-that is, are more semantically related-are located closer together. However, whereas inter-word proximity captures only overall relatedness, human judgements are highly context dependent. For example, dolphins and alligators are similar in size but differ in dangerousness. Here, we use a domain-general method to extract context-dependent relationships from word embeddings: 'semantic projection' of word-vectors onto lines that represent features such as size (the line connecting the words 'small' and 'big') or danger ('safe' to 'dangerous'), analogous to 'mental scales'. This method recovers human judgements across various object categories and properties. Thus, the geometry of word embeddings explicitly represents a wealth of context-dependent world knowledge.

Authors

  • Gabriel Grand
    Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA.
  • Idan Asher Blank
    Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139.
  • Francisco Pereira
    National Institute of Mental Health, Bethesda, MD, USA.
  • Evelina Fedorenko
    Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139; msch@mit.edu ngk@mit.edu evelina9@mit.edu.