TDFFM: Transformer and Deep Forest Fusion Model for Predicting Coronavirus 3C-Like Protease Cleavage Sites.

Journal: IEEE/ACM transactions on computational biology and bioinformatics
Published Date:

Abstract

COVID-19, caused by the highly contagious SARS-CoV-2 virus, is distinguished by its positive-sense, single-stranded RNA genome. A thorough understanding of SARS-CoV-2 pathogenesis is crucial for halting its proliferation. Notably, the 3C-like protease of the coronavirus (denoted as 3CL) is instrumental in the viral replication process. Precise delineation of 3CL cleavage sites is imperative for elucidating the transmission dynamics of SARS-CoV-2. While machine learning tools have been deployed to identify potential 3CL cleavage sites, these existing methods often fall short in terms of accuracy. To improve the performances of these predictions, we propose a novel analytical framework, the Transformer and Deep Forest Fusion Model (TDFFM). Within TDFFM, we utilize the AAindex and the BLOSUM62 matrix to encode protein sequences. These encoded features are subsequently input into two distinct components: a Deep Forest, which is an effective decision tree ensemble methodology, and a Transformer equipped with a Multi-Level Attention Model (TMLAM). The integration of the attention mechanism allows our model to more accurately identify positive samples, thus enhancing the overall predictive performance. Evaluation on a test set demonstrates that our TDFFM achieves an accuracy of 0.955, an AUC of 0.980, and an F1-score of 0.367, substantiating the model's superior prediction capabilities.

Authors

  • Qingsong Wang
    Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, 300072, China. Electronic address: wqs_bme@tju.edu.cn.
  • Ruiquan Ge
  • Changmiao Wang
    Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Boulevard, Shenzhen 518055, China; University of Chinese Academy of Sciences, 52 Sanlihe Road, Beijing 100864, China.
  • Ahmed Elazab
    Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Boulevard, Shenzhen 518055, China; University of Chinese Academy of Sciences, 52 Sanlihe Road, Beijing 100864, China.
  • Qiming Fang
    School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China.
  • Renfeng Zhang