Generating a focused view of disease ontology cancer terms for pan-cancer data integration and analysis.

Journal: Database : the journal of biological databases and curation
Published Date:

Abstract

Bio-ontologies provide terminologies for the scientific community to describe biomedical entities in a standardized manner. There are multiple initiatives that are developing biomedical terminologies for the purpose of providing better annotation, data integration and mining capabilities. Terminology resources devised for multiple purposes inherently diverge in content and structure. A major issue of biomedical data integration is the development of overlapping terms, ambiguous classifications and inconsistencies represented across databases and publications. The disease ontology (DO) was developed over the past decade to address data integration, standardization and annotation issues for human disease data. We have established a DO cancer project to be a focused view of cancer terms within the DO. The DO cancer project mapped 386 cancer terms from the Catalogue of Somatic Mutations in Cancer (COSMIC), The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium, Therapeutically Applicable Research to Generate Effective Treatments, Integrative Oncogenomics and the Early Detection Research Network into a cohesive set of 187 DO terms represented by 63 top-level DO cancer terms. For example, the COSMIC term 'kidney, NS, carcinoma, clear_cell_renal_cell_carcinoma' and TCGA term 'Kidney renal clear cell carcinoma' were both grouped to the term 'Disease Ontology Identification (DOID):4467 / renal clear cell carcinoma' which was mapped to the TopNodes_DOcancerslim term 'DOID:263 / kidney cancer'. Mapping of diverse cancer terms to DO and the use of top level terms (DO slims) will enable pan-cancer analysis across datasets generated from any of the cancer term sources where pan-cancer means including or relating to all or multiple types of cancer. The terms can be browsed from the DO web site (http://www.disease-ontology.org) and downloaded from the DO's Apache Subversion or GitHub repositories. Database URL: http://www.disease-ontology.org

Authors

  • Tsung-Jung Wu
    Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA.
  • Lynn M Schriml
    Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA.
  • Qing-Rong Chen
    Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA.
  • Maureen Colbert
    Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA.
  • Daniel J Crichton
    Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA.
  • Richard Finney
    Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA.
  • Ying Hu
    Department of Ultrasonography, The First Affiliated Hospital, College of Medicine, Zhejiang University, Qingchun Road No. 79, Hangzhou, Zhejiang 310003, China.
  • Warren A Kibbe
    Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA.
  • Heather Kincaid
    Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA.
  • Daoud Meerzaman
    Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA.
  • Elvira Mitraka
    Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA.
  • Yang Pan
    Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA.
  • Krista M Smith
    Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA.
  • Sudhir Srivastava
    Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA.
  • Sari Ward
    Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA.
  • Cheng Yan
    Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA.
  • Raja Mazumder
    Department of Biochemistry and Molecular Medicine, School of Medicine and Health Sciences, The George Washington University, Washington, DC, United States.