The Unified Phenotype Ontology : a framework for cross-species integrative phenomics.

Journal: Genetics
PMID:

Abstract

Phenotypic data are critical for understanding biological mechanisms and consequences of genomic variation, and are pivotal for clinical use cases such as disease diagnostics and treatment development. For over a century, vast quantities of phenotype data have been collected in many different contexts covering a variety of organisms. The emerging field of phenomics focuses on integrating and interpreting these data to inform biological hypotheses. A major impediment in phenomics is the wide range of distinct and disconnected approaches to recording the observable characteristics of an organism. Phenotype data are collected and curated using free text, single terms or combinations of terms, using multiple vocabularies, terminologies, or ontologies. Integrating these heterogeneous and often siloed data enables the application of biological knowledge both within and across species. Existing integration efforts are typically limited to mappings between pairs of terminologies; a generic knowledge representation that captures the full range of cross-species phenomics data is much needed. We have developed the Unified Phenotype Ontology (uPheno) framework, a community effort to provide an integration layer over domain-specific phenotype ontologies, as a single, unified, logical representation. uPheno comprises (1) a system for consistent computational definition of phenotype terms using ontology design patterns, maintained as a community library; (2) a hierarchical vocabulary of species-neutral phenotype terms under which their species-specific counterparts are grouped; and (3) mapping tables between species-specific ontologies. This harmonized representation supports use cases such as cross-species integration of genotype-phenotype associations from different organisms and cross-species informed variant prioritization.

Authors

  • Nicolas Matentzoglu
    School of Computer Science, University of Manchester, Oxford Road, Manchester, UK. nicolas.matentzoglu@manchester.ac.uk.
  • Susan M Bello
    The Jackson Laboratory, 600 Main St, Bar Harbor, ME 04609, USA.
  • Ray Stefancsik
    Samples Phenotypes and Ontologies Team (SPOT), European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.
  • Sarah M Alghamdi
    King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, 23955-6900, Saudi Arabia.
  • Anna V Anagnostopoulos
    The Jackson Laboratory, Bar Harbor, ME, USA.
  • James P Balhoff
    National Evolutionary Synthesis Center, Durham, NC 27705, USA; University of North Carolina, Chapel Hill, NC 27599, USA;
  • Meghan A Balk
    Natural History Museum, University of Oslo, Oslo, Norway.
  • Yvonne M Bradford
    ZFIN, the Zebrafish Model Organism Database, 5291 University of Oregon, Eugene, OR, 97403, USA.
  • Yasemin Bridges
    William Harvey Research Institute, Queen Mary University of London, London, E14 NS, UK.
  • Tiffany J Callahan
    Computational Bioscience Program and Department of Pharmacology, University of Colorado Denver Anschutz Medical Campus, Aurora, Colorado 80045, USA.
  • Harry Caufield
    Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
  • Alayne Cuzick
    Department of Biointeractions and Crop Protection, Rothamsted Research, West Common, Harpenden, AL52 JQ, UK.
  • Leigh C Carmody
    Monarch Initiative (monarchinitiative.org).
  • Anita R Caron
    Samples Phenotypes and Ontologies Team (SPOT), European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.
  • Vinicius de Souza
    European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK.
  • Stacia R Engel
    Saccharomyces Genome Database, Department of Genetics, Stanford University, Porter Drive, Palo Alto, CA, USA.
  • Petra Fey
    dictyBase, Biomedical Informatics Center and Center for Genetic Medicine, Northwestern University, Feinberg School of Medicine, North Lake Shore Drive, Chicago, IL, USA.
  • Malcolm Fisher
    Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA.
  • Sarah Gehrke
    Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA.
  • Christian Grove
    Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA.
  • Peter Hansen
    Department of Computer Science, Middlebury College, Middlebury, VT 05753, United States.
  • Nomi L Harris
    Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA.
  • Midori A Harris
    PomBase, Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Sanger Building, Tennis Court Road, Cambridge, UK.
  • Laura Harris
    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK.
  • Arwa Ibrahim
    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
  • Julius O B Jacobsen
    Genomics England, Queen Mary University of London, Dawson Hall, Charterhouse Square, London EC1M 6BQ, UK.
  • Sebastian Köhler
    School of ITEE, The University of Queensland, St. Lucia, QLD 4072, Australia, Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK, National Institute of Informatics, Hitotsubashi, Tokyo, Japan, Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK, LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal, Genetic Services of Western Australia, King Edward Memorial Hospital, WA 6008, Australia, School of Paediatrics and Child Health, University of Western Australia, WA 6008, Australia, Institute for Immunology and Infectious Diseases, Murdoch University, WA 6150, Australia, Office of Population Health, Public Health and Clinical Services Division, Western Australian Department of Health, WA 6004, Australia, Academic Department of Medical Genetics, Sydney Children's Hospitals Network (Westmead), NSW 2145, Australia, Discipline of Genetic Medicine, Sydney Medical School, The University of Sydney, NSW 2006, Australia, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany, Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany and Berlin Brandenburg Center for Regenerative Therapies, 13353 Berlin, Germany.
  • Julie A McMurry
    Monarch Initiative, monarchinitiative.org.
  • Violeta Munoz-Fuentes
    UNEP-WCMC, Cambridge CB3 0DL, UK.
  • Monica C Munoz-Torres
    Monarch Initiative.
  • Helen Parkinson
    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
  • Zoë M Pendlington
    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK.
  • Clare Pilgrim
    Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3DY, UK.
  • Sofia M C Robb
    Stowers Institute for Medical Research, Kansas City, MO 64110, USA.
  • Peter N Robinson
    The Jackson Laboratory for Genomic Medicine Farmington CT 06032 USA.
  • James Seager
    Department of Biointeractions and Crop Protection, Rothamsted Research, West Common, Harpenden, AL52 JQ, UK.
  • Erik Segerdell
    Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.
  • Damian Smedley
    School of ITEE, The University of Queensland, St. Lucia, QLD 4072, Australia, Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia, Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin, Germany, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK, National Institute of Informatics, Hitotsubashi, Tokyo, Japan, Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK, LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal, Genetic Services of Western Australia, King Edward Memorial Hospital, WA 6008, Australia, School of Paediatrics and Child Health, University of Western Australia, WA 6008, Australia, Institute for Immunology and Infectious Diseases, Murdoch University, WA 6150, Australia, Office of Population Health, Public Health and Clinical Services Division, Western Australian Department of Health, WA 6004, Australia, Academic Department of Medical Genetics, Sydney Children's Hospitals Network (Westmead), NSW 2145, Australia, Discipline of Genetic Medicine, Sydney Medical School, The University of Sydney, NSW 2006, Australia, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany, Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany and Berlin Brandenburg Center for Regenerative Therapies, 13353 Berlin, Germany.
  • Elliot Sollis
    European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
  • Sabrina Toro
    Zebrafish Information Network, University of Oregon, Eugene, OR, USA.
  • Nicole Vasilevsky
    Library, Oregon Health & Science University, Portland, OR 97239, USA.
  • Valerie Wood
    PomBase, Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Sanger Building, Tennis Court Road, Cambridge, UK.
  • Melissa A Haendel
    Library, Oregon Health & Science University, Portland, OR 97239, USA.
  • Christopher J Mungall
    Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
  • James A McLaughlin
    European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK.
  • David Osumi-Sutherland
    European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK.