Unsupervised Learning Framework With Multidimensional Scaling in Predicting Epithelial-Mesenchymal Transitions.

Journal: IEEE/ACM transactions on computational biology and bioinformatics
PMID:

Abstract

Clustering tumor metastasis samples from gene expression data at the whole genome level remains an arduous challenge, in particular, when the number of experimental samples is small and the number of genes is huge. We focus on the prediction of the epithelial-mesenchymal transition (EMT), which is an underlying mechanism of tumor metastasis, here, rather than tumor metastasis itself, to avoid confounding effects of uncertainties derived from various factors. In this paper, we propose a novel model in predicting EMT based on multidimensional scaling (MDS) strategies and integrating entropy and random matrix detection strategies to determine the optimal reduced number of dimension in low dimensional space. We verified our proposed model with the gene expression data for EMT samples of breast cancer and the experimental results demonstrated the superiority over state-of-the-art clustering methods. Furthermore, we developed a novel feature extraction method for selecting the significant genes and predicting the tumor metastasis. The source code is available at "https://github.com/yushanqiu/yushan.qiu-szu.edu.cn".

Authors

  • Yushan Qiu
    College of Mathematics and Statistics, Shenzhen University, Nanhai Avenue 3688, Shenzhen, 518060, China. yushan.qiu@szu.edu.cn.
  • Hao Jiang
    Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences , 555 Zuchongzhi Road, Shanghai 201203, China.
  • Wai-Ki Ching
    Department of Mathematics, The University of Hong Kong, Pokfulam Road, Hong Kong, Hong Kong.