Using graph convolutional neural networks to learn a representation for glycans.

Journal: Cell reports
Published Date:

Abstract

As the only nonlinear and the most diverse biological sequence, glycans offer substantial challenges for computational biology. These carbohydrates participate in nearly all biological processes-from protein folding to viral cell entry-yet are still not well understood. There are few computational methods to link glycan sequences to functions, and they do not fully leverage all available information about glycans. SweetNet is a graph convolutional neural network that uses graph representation learning to facilitate a computational understanding of glycobiology. SweetNet explicitly incorporates the nonlinear nature of glycans and establishes a framework to map any glycan sequence to a representation. We show that SweetNet outperforms other computational methods in predicting glycan properties on all reported tasks. More importantly, we show that glycan representations, learned by SweetNet, are predictive of organismal phenotypic and environmental properties. Finally, we use glycan-focused machine learning to predict viral glycan binding, which can be used to discover viral receptors.

Authors

  • Rebekka Burkholz
    Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA.
  • John Quackenbush
    Department of Biostatistics & Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts.
  • Daniel Bojar
    Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA; Department of Biological Engineering and Institute for Medical Engineering & Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.