Predicting and Designing Red Fluorescent Protein Variants Using Sequence-to-Function Machine Learning Models

Journal: bioRxiv
Published Date:

Abstract

Fluorescent proteins (FPs) are widely used reporters for visualizing cellular structures and processes. Traditional wet-lab strategies for FP engineering (rational design and directed evolution) have enabled substantial improvements in photophysical performance but are limited by their requirement for deep expert knowledge or labor-intensive screening. AI-driven approaches have recently gained traction for engineering variants of green FPs, yet applications to red fluorescent proteins (RFPs) remain scarce. Here, we applied machine learning models to an RFP sequence-function dataset and trained these models to predict functional single-mutation variants of the state-of-the-art RFP mScarlet-I3. Guided by model predictions, we identified variants exhibiting red-shifted emission peaks, large Stokes shifts, or brightness comparable to the parental protein. Our findings show that even lightweight, data-efficient models can extract actionable design principles for improving RFPs. This work demonstrates the feasibility of AI-guided design for RFPs and provides a reliable benchmark for future development of more powerful AI-driven strategies for FP engineering.

Authors

  • Ran Ji; Jean Jung; Howard Cheng; Ella Y. Xu; Audrey Wang; Keith Pardee; Yufeng Zhao