A change language for ontologies and knowledge graphs.

Journal: Database : the journal of biological databases and curation
PMID:

Abstract

Ontologies and knowledge graphs (KGs) are general-purpose computable representations of some domain, such as human anatomy, and are frequently a crucial part of modern information systems. Most of these structures change over time, incorporating new knowledge or information that was previously missing. Managing these changes is a challenge, both in terms of communicating changes to users and providing mechanisms to make it easier for multiple stakeholders to contribute. To fill that need, we have created KGCL, the Knowledge Graph Change Language (https://github.com/INCATools/kgcl), a standard data model for describing changes to KGs and ontologies at a high level, and an accompanying human-readable Controlled Natural Language (CNL). This language serves two purposes: a curator can use it to request desired changes, and it can also be used to describe changes that have already happened, corresponding to the concepts of "apply patch" and "diff" commonly used for managing changes in text documents and computer programs. Another key feature of KGCL is that descriptions are at a high enough level to be useful and understood by a variety of stakeholders-e.g. ontology edits can be specified by commands like "add synonym 'arm' to 'forelimb'" or "move 'Parkinson disease' under 'neurodegenerative disease'." We have also built a suite of tools for managing ontology changes. These include an automated agent that integrates with and monitors GitHub ontology repositories and applies any requested changes and a new component in the BioPortal ontology resource that allows users to make change requests directly from within the BioPortal user interface. Overall, the KGCL data model, its CNL, and associated tooling allow for easier management and processing of changes associated with the development of ontologies and KGs. Database URL: https://github.com/INCATools/kgcl.

Authors

  • Harshad Hegde
  • Jennifer Vendetti
    Center for Biomedical Informatics Research, Stanford University, 3180 Porter Dr., Palo Alto, CA 94304, United States.
  • Damien Goutte-Gattat
    Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3DY, UK.
  • J Harry Caufield
    Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States.
  • John B Graybeal
    Center for Biomedical Informatics Research, Stanford University, 3180 Porter Dr., Palo Alto, CA 94304, United States.
  • Nomi L Harris
    Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA.
  • Naouel Karam
    Institute for Applied Informatics (InfAI), Leipzig University, Goerdelerring 9, Leipzig 04109, Germany.
  • Christian Kindermann
    Center for Biomedical Informatics Research, Stanford University, 3180 Porter Dr., Palo Alto, CA 94304, United States.
  • Nicolas Matentzoglu
    School of Computer Science, University of Manchester, Oxford Road, Manchester, UK. nicolas.matentzoglu@manchester.ac.uk.
  • James A Overton
    La Jolla Institute for Allergy and Immunology, La Jolla, California, United States of America.
  • Mark A Musen
    Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305-5479, United States. Electronic address: musen@stanford.edu.
  • Christopher J Mungall
    Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.