An open source knowledge graph ecosystem for the life sciences.

Journal: Scientific data
Published Date:

Abstract

Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoint resources and abstraction algorithms), and benchmarks (e.g., prebuilt KGs). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 different large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.

Authors

  • Tiffany J Callahan
    Computational Bioscience Program and Department of Pharmacology, University of Colorado Denver Anschutz Medical Campus, Aurora, Colorado 80045, USA.
  • Ignacio J Tripodi
    Department of Computer Science, University of Colorado, Boulder, Colorado 80309, USA.
  • Adrianne L Stefanski
  • Luca Cappelletti
    Department of Computer Science "Giovanni degli Antoni,"Università degli Studi di Milano 20133 Milan Italy.
  • Sanya B Taneja
    Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, United States.
  • Jordan M Wyrwa
    University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
  • Elena Casiraghi
    Department of Computer Science "Giovanni degli Antoni,"Università degli Studi di Milano 20133 Milan Italy.
  • Nicolas A Matentzoglu
    Semanticly, Athens, Attiki, Greece.
  • Justin Reese
    Division of Environmental Genomics and Systems BiologyLawrence Berkeley National Laboratory Berkeley CA 94720 USA.
  • Jonathan C Silverstein
    Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15206, USA.
  • Charles Tapley Hoyt
  • Richard D Boyce
    Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.
  • Scott A Malec
    Division of Translational Informatics, University of New Mexico School of Medicine, Albuquerque, NM, 87131, USA.
  • Deepak R Unni
    SIB Swiss Institute of Bioinformatics, Basel 1015, Switzerland.
  • Marcin P Joachimiak
    Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, USA.
  • Peter N Robinson
    The Jackson Laboratory for Genomic Medicine Farmington CT 06032 USA.
  • Christopher J Mungall
    Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
  • Emanuele Cavalleri
    AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy.
  • Tommaso Fontana
    Dipartimento di ElettronicaInformazione e BioingegneriaPolitecnico di Milano 20133 Milan Italy.
  • Giorgio Valentini
    Department of Computer Science "Giovanni degli Antoni,"Università degli Studi di Milano 20133 Milan Italy.
  • Marco Mesiti
    AnacletoLab - Dipartimento di Informatica, Università degli Studi di Milano, Milan, 20133, Italy.
  • Lucas A Gillenwater
    Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
  • Brook Santangelo
    Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
  • Nicole A Vasilevsky
    Ontology Development Group, Library, Oregon Health and Science University, Portland, Oregon, 97239, USA.
  • Robert Hoehndorf
    Computational Bioscience Research Center, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia. robert.hoehndorf@kaust.edu.sa.
  • Tellen D Bennett
    Section of Informatics and Data Science, Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO.
  • Patrick B Ryan
    Janssen Research and Development, Raritan, NJ, USA.
  • George Hripcsak
    Department of Biomedical Informatics, Columbia University, 622 W 168th Street, PH20, New York, NY 10032, USA; Medical Informatics Services, NewYork-Presbyterian Hospital, 622 W 168th Street, PH20, New York, NY 10032, USA. Electronic address: hripcsak@columbia.edu.
  • Michael G Kahn
    University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
  • Michael Bada
    Computational Bioscience Program, University of Colorado School of Medicine, Aurora, Colorado 80045, USA.
  • William A Baumgartner
  • Lawrence E Hunter
    Computational Bioscience Program and Department of Pharmacology, University of Colorado Denver Anschutz Medical Campus, Aurora, Colorado 80045, USA.