Signal Peptides Generated by Attention-Based Neural Networks.

Journal: ACS synthetic biology
PMID:

Abstract

Short (15-30 residue) chains of amino acids at the amino termini of expressed proteins known as signal peptides (SPs) specify secretion in living cells. We trained an attention-based neural network, the Transformer model, on data from all available organisms in Swiss-Prot to generate SP sequences. Experimental testing demonstrates that the model-generated SPs are functional: when appended to enzymes expressed in an industrial strain, the SPs lead to secreted activity that is competitive with industrially used SPs. Additionally, the model-generated SPs are diverse in sequence, sharing as little as 58% sequence identity to the closest known native signal peptide and 73% ± 9% on average.

Authors

  • Zachary Wu
    Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA.
  • Kevin K Yang
    Division of Chemistry and Chemical Engineering; California Institute of Technology; Pasadena, California; United States of America.
  • Michael J Liszka
    BASF Enzymes, San Diego, California 92121, United States.
  • Alycia Lee
    Department of Computational and Mathematical Sciences, California Institute of Technology, Pasadena, California 91125, United States.
  • Alina Batzilla
    BASF Enzymes, San Diego, California 92121, United States.
  • David Wernick
    BASF Enzymes, San Diego, California 92121, United States.
  • David P Weiner
    BASF Enzymes, San Diego, California 92121, United States.
  • Frances H Arnold
    Division of Biology and Biological Engineering; California Institute of Technology; Pasadena, California; United States of America.