Multimodal neural networks better explain multivoxel patterns in the hippocampus.

Journal: Neural networks : the official journal of the International Neural Network Society
Published Date:

Abstract

The human hippocampus possesses "concept cells", neurons that fire when presented with stimuli belonging to a specific concept, regardless of the modality. Recently, similar concept cells were discovered in a multimodal network called CLIP (Radford et al., 2021). Here, we ask whether CLIP can explain the fMRI activity of the human hippocampus better than a purely visual (or linguistic) model. We extend our analysis to a range of publicly available uni- and multi-modal models. We demonstrate that "multimodality" stands out as a key component when assessing the ability of a network to explain the multivoxel activity in the hippocampus.

Authors

  • Bhavin Choksi
    CerCO, CNRS UMR5549, Toulouse, France; Université de Toulouse, France. Electronic address: bhavin.choksi@cnrs.fr.
  • Milad Mozafari
    CerCO, CNRS UMR5549, Toulouse, France; IRIT CNRS, UMR5505, Toulouse, France.
  • Rufin VanRullen
    Université de Toulouse.
  • Leila Reddy
    Université de Toulouse.