Identifying disease trajectories with predicate information from a knowledge graph.

Journal: Journal of biomedical semantics
Published Date:

Abstract

BACKGROUND: Knowledge graphs can represent the contents of biomedical literature and databases as subject-predicate-object triples, thereby enabling comprehensive analyses that identify e.g. relationships between diseases. Some diseases are often diagnosed in patients in specific temporal sequences, which are referred to as disease trajectories. Here, we determine whether a sequence of two diseases forms a trajectory by leveraging the predicate information from paths between (disease) proteins in a knowledge graph. Furthermore, we determine the added value of directional information of predicates for this task. To do so, we create four feature sets, based on two methods for representing indirect paths, and both with and without directional information of predicates (i.e., which protein is considered subject and which object). The added value of the directional information of predicates is quantified by comparing the classification performance of the feature sets that include or exclude it.

Authors

  • Wytze J Vlietstra
    Department of Medical Informatics, Erasmus University Medical Centre, Rotterdam, 3015, GE, the Netherlands. w.vlietstra@erasmusmc.nl.
  • Rein Vos
    Department of Medical Informatics, Erasmus University Medical Centre, Rotterdam, 3015, GE, the Netherlands.
  • Marjan van den Akker
    Institute of General Practice, Johann Wolfgang Goethe University, Theodor-Stern-Kai 7, D-60590, Frankfurt, Germany.
  • Erik M van Mulligen
    Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands.
  • Jan A Kors
    Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands j.kors@erasmusmc.nl.