Transfer learning for cross-context prediction of protein expression from 5'UTR sequence.

Journal: Nucleic acids research
Published Date:

Abstract

Model-guided DNA sequence design can accelerate the reprogramming of living cells. It allows us to engineer more complex biological systems by removing the need to physically assemble and test each potential design. While mechanistic models of gene expression have seen some success in supporting this goal, data-centric, deep learning-based approaches often provide more accurate predictions. This accuracy, however, comes at a cost - a lack of generalization across genetic and experimental contexts that has limited their wider use outside the context in which they were trained. Here, we address this issue by demonstrating how a simple transfer learning procedure can effectively tune a pre-trained deep learning model to predict protein translation rate from 5' untranslated region (5'UTR) sequence for diverse contexts in Escherichia coli using a small number of new measurements. This allows for important model features learnt from expensive massively parallel reporter assays to be easily transferred to new settings. By releasing our trained deep learning model and complementary calibration procedure, this study acts as a starting point for continually refined model-based sequence design that builds on previous knowledge and future experimental efforts.

Authors

  • Pierre-AurĂ©lien Gilliot
    School of Biological Sciences, University of Bristol, Life Sciences Building Tyndall Avenue, Bristol BS8 1TQ, UK.
  • Thomas E Gorochowski
    School of Biological Sciences, University of Bristol, Life Sciences Building Tyndall Avenue, Bristol BS8 1TQ, UK; BrisSynBio, University of Bristol, Life Sciences Building, Tyndall Avenue, Bristol BS8 1TQ, UK. Electronic address: thomas.gorochowski@bristol.ac.uk.