Integrating Deep Learning and Synthetic Biology: A Co-Design Approach for Enhancing Gene Expression via N-Terminal Coding Sequences.

Journal: ACS synthetic biology
PMID:

Abstract

N-terminal coding sequence (NCS) influences gene expression by impacting the translation initiation rate. The NCS optimization problem is to find an NCS that maximizes gene expression. The problem is important in genetic engineering. However, current methods for NCS optimization such as rational design and statistics-guided approaches are labor-intensive yield only relatively small improvements. This paper introduces a deep learning/synthetic biology codesigned few-shot training workflow for NCS optimization. Our method utilizes -nearest encoding followed by word2vec to encode the NCS, then performs feature extraction using attention mechanisms, before constructing a time-series network for predicting gene expression intensity, and finally a direct search algorithm identifies the optimal NCS with limited training data. We took green fluorescent protein (GFP) expressed by as a reporting protein of NCSs, and employed the fluorescence enhancement factor as the metric of NCS optimization. Within just six iterative experiments, our model generated an NCS (MLD) that increased average GFP expression by 5.41-fold, outperforming the state-of-the-art NCS designs. Extending our findings beyond GFP, we showed that our engineered NCS (MLD) can effectively boost the production of N-acetylneuraminic acid by enhancing the expression of the crucial rate-limiting gene, demonstrating its practical utility. We have open-sourced our NCS expression database and experimental procedures for public use.

Authors

  • Zhanglu Yan
    School of Computing, National University of Singapore, Singapore 117417, Singapore.
  • Weiran Chu
    Science Center for Future Foods, Jiangnan University, Wuxi 214122, PR China.
  • Yuhua Sheng
    Science Center for Future Foods, Jiangnan University, Wuxi 214122, PR China.
  • Kaiwen Tang
    School of Computing, National University of Singapore, Singapore 117417, Singapore.
  • Shida Wang
    Department of Mathematics, National University of Singapore, Singapore 119077, Singapore.
  • Yanfeng Liu
    Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi 214122, China; Science Center for Future Foods, Ministry of Education, Jiangnan University, Wuxi 214122, China.
  • Weng-Fai Wong
    School of Computing, National University of Singapore, Singapore 117417, Singapore.