The accurate prediction and characterization of cancerlectin by a combined machine learning and GO analysis.
Journal:
Briefings in bioinformatics
PMID:
34113984
Abstract
Cancerlectins, lectins linked to tumor progression, have become the focus of cancer therapy research for their carbohydrate-binding specificity. However, the specific characterization for cancerlectins involved in tumor progression is still unclear. By taking advantage of the g-gap tripeptide and tetrapeptide composition feature descriptors, we increased the accuracy of the classification model of cancerlectin and lectin to 98.54% and 95.38%, respectively. About 36 cancerlectin and 135 lectin features were selected for functional characterization by P/N feature ranking method, which particularly selects the features in positive samples. The specific protein domains of cancerlectins are found to be p-GalNAc-T, crystal and annexin by comparing with lectins through the exclusion method. Moreover, the combined GO analysis showed that the conserved cation binding sites of cancerlectin specific domains are covered by selected feature peptides, suggesting that the capability of cation binding, critical for enzyme activity and stability, could be the key characteristic of cancerlectins in tumor progression. These results will help to identify potential cancerlectin and provide clues for mechanism study of cancerlectin in tumor progression.