Identification of DEP domain-containing proteins by a machine learning method and experimental analysis of their expression in human HCC tissues.

Journal: Scientific reports
Published Date:

Abstract

The Dishevelled/EGL-10/Pleckstrin (DEP) domain-containing (DEPDC) proteins have seven members. However, whether this superfamily can be distinguished from other proteins based only on the amino acid sequences, remains unknown. Here, we describe a computational method to segregate DEPDCs and non-DEPDCs. First, we examined the Pfam numbers of the known DEPDCs and used the longest sequences for each Pfam to construct a phylogenetic tree. Subsequently, we extracted 188-dimensional (188D) and 20D features of DEPDCs and non-DEPDCs and classified them with random forest classifier. We also mined the motifs of human DEPDCs to find the related domains. Finally, we designed experimental verification methods of human DEPDC expression at the mRNA level in hepatocellular carcinoma (HCC) and adjacent normal tissues. The phylogenetic analysis showed that the DEPDCs superfamily can be divided into three clusters. Moreover, the 188D and 20D features can both be used to effectively distinguish the two protein types. Motif analysis revealed that the DEP and RhoGAP domain was common in human DEPDCs, human HCC and the adjacent tissues that widely expressed DEPDCs. However, their regulation was not identical. In conclusion, we successfully constructed a binary classifier for DEPDCs and experimentally verified their expression in human HCC tissues.

Authors

  • Zhijun Liao
    Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, Fujian 350122, China.
  • Xinrui Wang
    State Key Laboratory for Medical Genomics, Shanghai Institute of Hematology, Rui-Jin Hospital affiliated to School of Medicine, Shanghai Jiao Tong University, Shanghai 200025, China.
  • Yeting Zeng
    Department of Pathology, Dongfang Hospital, Fuzhou 350025, China.
  • Quan Zou