ctPISP: Protein-Protein Interaction Sites Prediction Using Convolution and Transformer With Data Augmentation.

Journal: IEEE/ACM transactions on computational biology and bioinformatics
Published Date:

Abstract

Protein-protein interactions are the basis of many cellular biological processes, such as cellular organization, signal transduction, and immune response. Identifying protein-protein interaction sites is essential for understanding the mechanisms of various biological processes, disease development, and drug design. However, it remains a challenging task to make accurate predictions, as the small amount of training data and severe imbalanced classification reduce the performance of computational methods. We design a deep learning method named ctPISP to improve the prediction of protein-protein interaction sites. ctPISP employs Convolution and Transformer to extract information and enhance information perception so that semantic features can be mined to identify protein-protein interaction sites. A weighting loss function with different sample weights is designed to suppress the preference of the model toward multi-category prediction. To efficiently reuse the information in the training set, a preprocessing of data augmentation with an improved sample-oriented sampling strategy is applied. The trained ctPISP was evaluated against current state-of-the-art methods on six public datasets. The results show that ctPISP outperforms all other competing methods on the balance metrics: F1, MCC, and AUPRC. In particular, our prediction on open tests related to viruses may also be consistent with biological insights. The source code and data can be obtained from https://github.com/lennylv/ctP2ISP.

Authors

  • Kailong Li
    School of Computer Science and Technology, Soochow University, Suzhou 215006, China.
  • Lijun Quan
    School of Computer Science and Technology, Soochow University, Suzhou 215006, China.
  • Yelu Jiang
    School of Computer Science and Technology, Soochow University, Suzhou 215006, China.
  • Yan Li
    Interdisciplinary Research Center for Biology and Chemistry, Liaoning Normal University, Dalian, China.
  • Yiting Zhou
    School of Computer Science and Technology, Soochow University, Suzhou 215006, China.
  • Tingfang Wu
    1 Key Laboratory of Image Information Processing and Intelligent Control of Education Ministry of China, School of Automation, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China.
  • Qiang Lyu
    Department of Computer Science and Technology, Soochow University, Suzhou, Jiangsu, 215006, China.