Representation of locomotive action affordances in human behavior, brains, and deep neural networks.

Journal: Proceedings of the National Academy of Sciences of the United States of America
Published Date:

Abstract

To decide how to move around the world, we must determine which locomotive actions (e.g., walking, swimming, or climbing) are afforded by the immediate visual environment. The neural basis of our ability to recognize locomotive affordances is unknown. Here, we compare human behavioral annotations, functional MRI (fMRI) measurements, and deep neural network (DNN) activations to both indoor and outdoor real-world images to demonstrate that the human visual cortex represents locomotive action affordances in complex visual scenes. Hierarchical clustering of behavioral annotations of six possible locomotive actions show that humans group environments into distinct affordance clusters using at least three separate dimensions. Representational similarity analysis of multivoxel fMRI responses in the scene-selective visual cortex shows that perceived locomotive affordances are represented independently from other scene properties such as objects, surface materials, scene category, or global properties and independent of the task performed in the scanner. Visual feature activations from DNNs trained on object or scene classification as well as a range of other visual understanding tasks correlate comparatively lower with behavioral and neural representations of locomotive affordances than with object representations. Training DNNs directly on affordance labels or using affordance-centered language embeddings increases alignment with human behavior, but none of the tested models fully captures locomotive action affordance perception. These results uncover a type of representation in the human brain that reflects locomotive action affordances.

Authors

  • Clemens G Bartnik
    Informatics Institute, Video and Image Sense Lab, University of Amsterdam, Amsterdam 1098 XH, The Netherlands.
  • Christina Sartzetaki
    Informatics Institute, Video and Image Sense Lab, University of Amsterdam, Amsterdam 1098 XH, The Netherlands.
  • Abel Puigseslloses Sanchez
    Informatics Institute, Video and Image Sense Lab, University of Amsterdam, Amsterdam 1098 XH, The Netherlands.
  • Elijah Molenkamp
    Informatics Institute, Video and Image Sense Lab, University of Amsterdam, Amsterdam 1098 XH, The Netherlands.
  • Steven Bommer
    Informatics Institute, Video and Image Sense Lab, University of Amsterdam, Amsterdam 1098 XH, The Netherlands.
  • Nikolina Vukšić
    Informatics Institute, Video and Image Sense Lab, University of Amsterdam, Amsterdam 1098 XH, The Netherlands.
  • Iris I A Groen
    Informatics Institute, Video and Image Sense Lab, University of Amsterdam, Amsterdam 1098 XH, The Netherlands.