UTR-Insight: integrating deep learning for efficient 5' UTR discovery and design.

Journal: BMC genomics
Published Date:

Abstract

The 5' UTR is critical for mRNA stability and translation efficiency in therapeutics. We developed UTR-Insight, a model integrating a pretrained language model with a CNN-Transformer architecture, explaining 89.1% of the mean ribosome load (MRL) variation in random 5' UTRs and 82.8% in endogenous 5' UTRs, surpassing existing models. Using UTR-Insight, we performed high-throughput in silico screening of hundreds of thousands of endogenous 5' UTRs from primates, mice, and viruses. The screened sequences increased protein expression by up to 319% compared to the human α-globin 5' UTR, and UTR-Insight-designed sequences achieved even greater expression levels than high-performing endogenous 5' UTRs.

Authors

  • Saichao Pan
    Shenzhen Rhegen Biotechnology Co. Ltd, Shenzhen, Guangdong, China.
  • Hanyu Wang
    School of Semiconductor Science and Technology, South China Normal University, Foshan, 528225, P.R. China.
  • Hang Zhang
    Department of Cardiology, Rui Jin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
  • Zan Tang
    Shenzhen Rhegen Biotechnology Co. Ltd, Shenzhen, Guangdong, China.
  • Lianqiang Xu
    Shenzhen Rhegen Biotechnology Co. Ltd, Shenzhen, Guangdong, China.
  • Zhixiang Yan
    Shenzhen Rhegen Biotechnology Co. Ltd, Shenzhen, Guangdong, China. zhixiang.yan@rhegen.com.
  • Yong Hu
    Big Data Decision Institute, Jinan University, Guangzhou, China.