Annotating gene sets by mining large literature collections with protein networks.

Journal: Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Published Date: Jan 1, 2018

Abstract

Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network. Multiscale functional annotations are inferred based on network distances between phrases and genes and then visualized as an ontology of biological concepts. To evaluate this system, we predict functions for gene sets representing known pathways and find that our approach achieves substantial improvement over the conventional text-mining baseline method. Moreover, our system discovers novel annotations for gene sets or pathways without previously known functions. Two case studies demonstrate how the system is used in discovery of new cancer-related pathways with ontological annotations.

Authors

Sheng Wang

Intensive Care Medical Center, Tongji Hospital, School of Medicine, Tongji University, Shanghai, 200065, People's Republic of China.
Jianzhu Ma

Toyota Technological Institute at Chicago, 6045 S. Kenwood Ave. Chicago, Illinois 60637 USA.
Michael Ku Yu
Fan Zheng
Edward W Huang
Jiawei Han

Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA Institute of Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
Jian Peng

Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL, USA.
Trey Ideker

Keywords

Algorithms Computational Biology Data Mining Gene Ontology Gene Regulatory Networks Humans Molecular Sequence Annotation Natural Language Processing Neoplasms Protein Interaction Maps

External Resources

View on PubMed PubMed (29218918)

Annotating gene sets by mining large literature collections with protein networks.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals