DeepComBat: A statistically motivated, hyperparameter-robust, deep learning approach to harmonization of neuroimaging data.

Journal: Human brain mapping
PMID:

Abstract

Neuroimaging data acquired using multiple scanners or protocols are increasingly available. However, such data exhibit technical artifacts across batches which introduce confounding and decrease reproducibility. This is especially true when multi-batch data are analyzed using complex downstream models which are more likely to pick up on and implicitly incorporate batch-related information. Previously proposed image harmonization methods have sought to remove these batch effects; however, batch effects remain detectable in the data after applying these methods. We present DeepComBat, a deep learning harmonization method based on a conditional variational autoencoder and the ComBat method. DeepComBat combines the strengths of statistical and deep learning methods in order to account for the multivariate relationships between features while simultaneously relaxing strong assumptions made by previous deep learning harmonization methods. As a result, DeepComBat can perform multivariate harmonization while preserving data structure and avoiding the introduction of synthetic artifacts. We apply this method to cortical thickness measurements from a cognitive-aging cohort and show DeepComBat qualitatively and quantitatively outperforms existing methods in removing batch effects while preserving biological heterogeneity. Additionally, DeepComBat provides a new perspective for statistically motivated deep learning harmonization methods.

Authors

  • Fengling Hu
    Students, Perelman School of Medicine at University of Pennsylvania, Philadelphia.
  • Alfredo Lucas
    Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
  • Andrew A Chen
    Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Dr, Philadelphia, PA 19104, United States.
  • Kyle Coleman
    Statistical Center for Single-Cell and Spatial Genomics, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA.
  • Hannah Horng
    Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA.
  • Raymond W S Ng
    Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
  • Nicholas J Tustison
    a Department of Radiology and Medical Imaging.
  • Kathryn A Davis
    Penn Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, PA 19104; Department of Neurology, Hospital of the University of Pennsylvania, Philadelphia, PA 19104.
  • Haochang Shou
    Artificial Intelligence in Biomedical Imaging Laboratory (AIBIL), Center for and Data Science for Integrated Diagnostics (AID), Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
  • Mingyao Li
    Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA. mingyao@pennmedicine.upenn.edu.
  • Russell T Shinohara
    Artificial Intelligence in Biomedical Imaging Laboratory (AIBIL), Center for and Data Science for Integrated Diagnostics (AID), Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.