Machine learning-aided scoring of synthesis difficulties for designer chromosomes.

Journal: Science China. Life sciences
PMID:

Abstract

Designer chromosomes are artificially synthesized chromosomes. Nowadays, these chromosomes have numerous applications ranging from medical research to the development of biofuels. However, some chromosome fragments can interfere with the chemical synthesis of designer chromosomes and eventually limit the widespread use of this technology. To address this issue, this study aimed to develop an interpretable machine learning framework to predict and quantify the synthesis difficulties of designer chromosomes in advance. Through the use of this framework, six key sequence features leading to synthesis difficulties were identified, and an eXtreme Gradient Boosting model was established to integrate these features. The predictive model achieved high-quality performance with an AUC of 0.895 in cross-validation and an AUC of 0.885 on an independent test set. Based on these results, the synthesis difficulty index (S-index) was proposed as a means of scoring and interpreting synthesis difficulties of chromosomes from prokaryotes to eukaryotes. The findings of this study emphasize the significant variability in synthesis difficulties between chromosomes and demonstrate the potential of the proposed model to predict and mitigate these difficulties through the optimization of the synthesis process and genome rewriting.

Authors

  • Yan Zheng
    School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi'an, 710072, China.
  • Kai Song
    College of Information and Computer, Taiyuan University of Technology, Taiyuan, China.
  • Ze-Xiong Xie
    Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering, Ministry of Education, Tianjin University, Tianjin, 300072, China.
  • Ming-Zhe Han
    Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering, Ministry of Education, Tianjin University, Tianjin, 300072, China.
  • Fei Guo
    School of Electronic Information Engineering, Tianjin University, Tianjin 300072, China. Electronic address: gfjy001@yahoo.com.
  • Ying-Jin Yuan
    Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering, Ministry of Education, Tianjin University, Tianjin, 300072, China. yjyuan@tju.edu.cn.