Finding the needle in a high-dimensional haystack: Canonical correlation analysis for neuroscientists.

Journal: NeuroImage
Published Date:

Abstract

The 21st century marks the emergence of "big data" with a rapid increase in the availability of datasets with multiple measurements. In neuroscience, brain-imaging datasets are more commonly accompanied by dozens or hundreds of phenotypic subject descriptors on the behavioral, neural, and genomic level. The complexity of such "big data" repositories offer new opportunities and pose new challenges for systems neuroscience. Canonical correlation analysis (CCA) is a prototypical family of methods that is useful in identifying the links between variable sets from different modalities. Importantly, CCA is well suited to describing relationships across multiple sets of data, such as in recently available big biomedical datasets. Our primer discusses the rationale, promises, and pitfalls of CCA.

Authors

  • Hao-Ting Wang
    Department of Psychology, University of York, Heslington, York, England, UK. Electronic address: haoting.wang@york.ac.uk.
  • Jonathan Smallwood
    Department of Psychology, University of York, Heslington, York, England, UK.
  • Janaina MourĂ£o-Miranda
    Centre for Medical Image Computing, Department of Computer Science, University College London, London, United Kingdom; Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom.
  • Cedric Huchuan Xia
    Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
  • Theodore D Satterthwaite
    Department of Psychiatry, University of Pennsylvania School of Medicine, Philadelphia, PA, USA.
  • Danielle S Bassett
    University of Pennsylvania.
  • Danilo Bzdok
    Department of Psychiatry at the RWTH Aachen University in Germany and a Visiting Professor at INRIA/Neurospin Saclay in France.