Integration of biological data via NMF for identification of human disease-associated gene modules through multi-label classification.

Journal: PloS one
PMID:

Abstract

Proteins associated with multiple diseases often interact, forming disease modules that are critical for understanding disease mechanisms. This study integrates protein-protein interactions (PPIs) and Gene Ontology data using non-negative matrix factorization (NMF) to identify gene modules associated with human diseases. We leverage two biological sources of information, protein-protein interactions (PPIs) and Gene Ontology data, to find connections between novel genes and diseases. The data sources are first converted into networks, which are then clustered to obtain modules. Two types of modules are then integrated through an NMF-based technique to obtain a set of meta-modules that preserve the essential characteristics of interaction patterns and functional similarity information among the proteins/genes. Each meta-module is labeled based on its statistical and biological properties, and a multi-label classification technique is employed to assign new disease labels to genes. We identified 3,131 gene-disease associations, validated through a literature review, Gene Ontology, and pathway analysis.

Authors

  • Syed Alberuni
    Department of Computer Science & Engineering, Aliah University, Kolkata, West Bengal, India.
  • Sumanta Ray
    Department of Computer Science and Engineering, Aliah University, Kolkata, West Bengal, India.