Dissection of gene expression datasets into clinically relevant interaction signatures via high-dimensional correlation maximization.

Journal: Nature communications
PMID:

Abstract

Gene expression is controlled by many simultaneous interactions, frequently measured collectively in biology and medicine by high-throughput technologies. It is a highly challenging task to infer from these data the generating effects and cooperating genes. Here, we present an unsupervised hypothesis-generating learning concept termed signal dissection by correlation maximization (SDCM) that dissects large high-dimensional datasets into signatures. Each signature captures a particular signal pattern that was consistently observed for multiple genes and samples, likely caused by the same underlying interaction. A key difference to other methods is our flexible nonlinear signal superposition model, combined with a precise regression technique. Analyzing gene expression of diffuse large B-cell lymphoma, our method discovers previously unidentified signatures that reveal significant differences in patient survival. These signatures are more predictive than those from various methods used for comparison and robustly validate across technological platforms. This implies highly specific extraction of clinically relevant gene interactions.

Authors

  • Michael Grau
    Department of Medicine A, Albert-Schweitzer Campus 1, University Hospital Münster, 48149, Münster, Germany.
  • Georg Lenz
    Department of Medicine A, Albert-Schweitzer Campus 1, University Hospital Münster, 48149, Münster, Germany.
  • Peter Lenz
    Department of Physics, Renthof 5, University of Marburg, 35032, Marburg, Germany. Peter.Lenz@physik.uni-marburg.de.