TransPTM: a transformer-based model for non-histone acetylation site prediction.

Journal: Briefings in bioinformatics
PMID:

Abstract

Protein acetylation is one of the extensively studied post-translational modifications (PTMs) due to its significant roles across a myriad of biological processes. Although many computational tools for acetylation site identification have been developed, there is a lack of benchmark dataset and bespoke predictors for non-histone acetylation site prediction. To address these problems, we have contributed to both dataset creation and predictor benchmark in this study. First, we construct a non-histone acetylation site benchmark dataset, namely NHAC, which includes 11 subsets according to the sequence length ranging from 11 to 61 amino acids. There are totally 886 positive samples and 4707 negative samples for each sequence length. Secondly, we propose TransPTM, a transformer-based neural network model for non-histone acetylation site predication. During the data representation phase, per-residue contextualized embeddings are extracted using ProtT5 (an existing pre-trained protein language model). This is followed by the implementation of a graph neural network framework, which consists of three TransformerConv layers for feature extraction and a multilayer perceptron module for classification. The benchmark results reflect that TransPTM has the competitive performance for non-histone acetylation site prediction over three state-of-the-art tools. It improves our comprehension on the PTM mechanism and provides a theoretical basis for developing drug targets for diseases. Moreover, the created PTM datasets fills the gap in non-histone acetylation site datasets and is beneficial to the related communities. The related source code and data utilized by TransPTM are accessible at https://www.github.com/TransPTM/TransPTM.

Authors

  • Lingkuan Meng
    Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong.
  • Xingjian Chen
    School of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095, Jiangsu, China.
  • Ke Cheng
    School of Computer Science and Engineering, Jiangsu University of Science and Technology, No. 2 Mengxi Road, Zhenjiang 212003, China.
  • Nanjun Chen
    Department of Computer Science, City University of Hong Kong, Hong Kong, China.
  • Zetian Zheng
    Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR.
  • Fuzhou Wang
    Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong.
  • Hongyan Sun
    School of Mechanical Engineering and Automation, Beihang University, Beijing, 100191, China.
  • Ka-Chun Wong