Identification of DEP domain-containing proteins by a machine learning method and experimental analysis of their expression in human HCC tissues.
Journal:
Scientific reports
Published Date:
Dec 21, 2016
Abstract
The Dishevelled/EGL-10/Pleckstrin (DEP) domain-containing (DEPDC) proteins have seven members. However, whether this superfamily can be distinguished from other proteins based only on the amino acid sequences, remains unknown. Here, we describe a computational method to segregate DEPDCs and non-DEPDCs. First, we examined the Pfam numbers of the known DEPDCs and used the longest sequences for each Pfam to construct a phylogenetic tree. Subsequently, we extracted 188-dimensional (188D) and 20D features of DEPDCs and non-DEPDCs and classified them with random forest classifier. We also mined the motifs of human DEPDCs to find the related domains. Finally, we designed experimental verification methods of human DEPDC expression at the mRNA level in hepatocellular carcinoma (HCC) and adjacent normal tissues. The phylogenetic analysis showed that the DEPDCs superfamily can be divided into three clusters. Moreover, the 188D and 20D features can both be used to effectively distinguish the two protein types. Motif analysis revealed that the DEP and RhoGAP domain was common in human DEPDCs, human HCC and the adjacent tissues that widely expressed DEPDCs. However, their regulation was not identical. In conclusion, we successfully constructed a binary classifier for DEPDCs and experimentally verified their expression in human HCC tissues.
Authors
Keywords
Algorithms
Amino Acid Motifs
Carcinoma, Hepatocellular
Cluster Analysis
Computational Biology
Dishevelled Proteins
Gene Expression Profiling
Gene Expression Regulation, Neoplastic
GTPase-Activating Proteins
Humans
Liver Neoplasms
Machine Learning
Phylogeny
Protein Domains
Real-Time Polymerase Chain Reaction
RNA, Messenger
Signal Transduction