A deep learning model trained on expressed transcripts across different tissue types reveals cell-type codon-optimization preferences.
Journal:
Nucleic acids research
PMID:
40156867
Abstract
Species-specific differences in protein translation can affect the design of protein-based drugs. Consequently, efficient expression of recombinant proteins often requires codon optimization. Publicly available optimization tools do not always result in higher expression levels and can lead to protein misfolding and reduced expression. Here, we aimed to develop a novel deep learning (DL) tool using a recurrent neural network (RNN) to define cell type-dependent codon biases. Using gene expression data from three different tissue types (brain, liver, and muscle) and all secretory genes, we trained DL models to predict optimal codon usage. Codon-optimized sequences for test reporter genes exhibited enhanced protein expression compared to their original sequences and those optimized using a publicly available tool. Interestingly, DL models trained on genes expressed in liver cells (hepatocytes) resulted in the highest levels of expression when tested in vitro, irrespective of the cell type. Our findings also demonstrate that DL-based codon optimization algorithms can significantly enhance protein translation, particularly for secretory proteins, which are crucial for therapeutic applications. This research represents a novel approach to codon optimization with broader implications for protein-based pharmaceuticals, vaccine manufacturing, gene therapy, and other recombinant DNA products.