MCluster-VAEs: An end-to-end variational deep learning-based clustering method for subtype discovery using multi-omics data.

Journal: Computers in biology and medicine
Published Date:

Abstract

The discovery of cancer subtypes based on unsupervised clustering helps in providing a precise diagnosis, guide treatment, and improve patients' prognoses. Instead of single-omics data, multi-omics data can improve the clustering performance because it obtains a comprehensive landscape for understanding biological systems and mechanisms. However, heterogeneous data from multiple sources raises high complexity and different kinds of noise, which are detrimental to the extraction of clustering information. We propose an end-to-end deep learning based method, called Multi-omics Clustering Variational Autoencoders (MCluster-VAEs), that can extract cluster-friendly representations on multi-omics data. First, a unified network architecture with an attention mechanism was developed for accurately modeling multi-omics data. Then, using a novel objective function built from the Variational Bayes technique, the model was trained to effectively obtain the posterior estimation of the clustering assignments. Compared with 12 other state-of-the-art multi-omics clustering methods, MCluster-VAEs achieved an outstanding performance on benchmark datasets from the TCGA database. On the Pan Cancer dataset, MCluster-VAEs achieved an adjusted Rand index of approximately 0.78 for cancer category recognition, an increase of more than 18% compared with other methods. Furthermore, a survival analysis and clinical parameter enrichment tests conducted on 10 cancer datasets demonstrated that MCluster-VAEs provides comparable and even better results than many common integrative approaches. These results demonstrate that MCluster-VAEs are a powerful new tool for dissecting complex multi-omics relationships and providing new insights for cancer subtype discovery.

Authors

  • Zhiwei Rong
    Department of Epidemiology and Biostatistics, School of Public Health, Harbin Medical University, Harbin 150086, China.
  • Zhilin Liu
    Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China.
  • Jiali Song
    Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China.
  • Lei Cao
    State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, Liaoning, People's Republic of China. Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang, Liaoning, People's Republic of China. University of Chinese Academy of Sciences, Beijing, People's Republic of China.
  • Yipe Yu
    Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China.
  • Mantang Qiu
    Department of Thoracic Surgery, Peking University People's Hospital, Beijing, China.
  • Yan Hou
    Department of Biostatistics Beijing, Peking University School of Public Health, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China; Peking University Clinical Research Center, No. 38 Xueyuan Road, Haidian District, Beijing, 100000, China. Electronic address: houyan@bjmu.edu.cn.