BERNN: Enhancing classification of Liquid Chromatography Mass Spectrometry data with batch effect removal neural networks.

Journal: Nature communications
Published Date:

Abstract

Liquid Chromatography Mass Spectrometry (LC-MS) is a powerful method for profiling complex biological samples. However, batch effects typically arise from differences in sample processing protocols, experimental conditions, and data acquisition techniques, significantly impacting the interpretability of results. Correcting batch effects is crucial for the reproducibility of omics research, but current methods are not optimal for the removal of batch effects without compressing the genuine biological variation under study. We propose a suite of Batch Effect Removal Neural Networks (BERNN) to remove batch effects in large LC-MS experiments, with the goal of maximizing sample classification performance between conditions. More importantly, these models must efficiently generalize in batches not seen during training. A comparison of batch effect correction methods across five diverse datasets demonstrated that BERNN models consistently showed the strongest sample classification performance. However, the model producing the greatest classification improvements did not always perform best in terms of batch effect removal. Finally, we show that the overcorrection of batch effects resulted in the loss of some essential biological variability. These findings highlight the importance of balancing batch effect removal while preserving valuable biological diversity in large-scale LC-MS experiments.

Authors

  • Simon J Pelletier
    Computational Biology Laboratory, CHU de Québec - Université Laval Research Center, Québec City, QC, Canada.
  • Mickaël Leclercq
    Computational Biology Laboratory, CHU de Québec - Université Laval Research Center, Québec City, Québec, Canada.
  • Florence Roux-Dalvai
    Proteomics platform, CHU de Québec - Université Laval Research Center, Québec City, Québec, Canada.
  • Matthijs B de Geus
    Massachusetts General Hospital Department of Neurology, Charlestown, MA, USA.
  • Shannon Leslie
    Yale Department of Psychiatry, New Haven, CT, USA.
  • Weiwei Wang
  • TuKiet T Lam
    Keck MS & Proteomics Resource, Yale School of Medicine, New Haven, CT, USA.
  • Angus C Nairn
    Yale Department of Psychiatry, New Haven, CT, USA.
  • Steven E Arnold
    Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
  • Becky C Carlyle
    Massachusetts General Hospital Department of Neurology, Charlestown, MA, USA.
  • Frédéric Precioso
    Université Côte d'Azur, CNRS, INRIA, I3S, Sophia Antipolis, France.
  • Arnaud Droit
    Proteomics platform, CHU de Québec - Université Laval Research Center, Québec City, Québec, Canada; Computational Biology Laboratory, CHU de Québec - Université Laval Research Center, Québec City, Québec, Canada; Département de Médecine Moléculaire, Faculté de médecine, Université Laval, Québec City, QC, Canada. Electronic address: arnaud.droit@crchuq.ulaval.ca.