Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons.

Journal: Nature communications
PMID:

Abstract

In order to better understand how the brain perceives faces, it is important to know what objective drives learning in the ventral visual stream. To answer this question, we model neural responses to faces in the macaque inferotemporal (IT) cortex with a deep self-supervised generative model, β-VAE, which disentangles sensory data into interpretable latent factors, such as gender or age. Our results demonstrate a strong correspondence between the generative factors discovered by β-VAE and those coded by single IT neurons, beyond that found for the baselines, including the handcrafted state-of-the-art model of face perception, the Active Appearance Model, and deep classifiers. Moreover, β-VAE is able to reconstruct novel face images using signals from just a handful of cells. Together our results imply that optimising the disentangling objective leads to representations that closely resemble those in the IT at the single unit level. This points at disentangling as a plausible learning objective for the visual brain.

Authors

  • Irina Higgins
    DeepMind, London, UK. irinah@google.com.
  • Le Chang
    Division of Biology and Biological Engineering, Computation and Neural Systems, Caltech, Pasadena, CA 91125, USA; Institute of Neuroscience, Key Laboratory of Primate Neurobiology, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China. Electronic address: lechang@ion.ac.cn.
  • Victoria Langston
    DeepMind, London, UK.
  • Demis Hassabis
    Google DeepMind, 5 New Street Square, London EC4A 3TW, UK.
  • Christopher Summerfield
    DeepMind, 5 New Street Square, London, UK; Department of Experimental Psychology, University of Oxford, Oxford, UK.
  • Doris Tsao
    Caltech, Pasadena, USA.
  • Matthew Botvinick
    DeepMind, London, UK. botvinick@google.com.