BioFed: federated query processing over life sciences linked open data.

Journal: Journal of biomedical semantics
Published Date:

Abstract

BACKGROUND: Biomedical data, e.g. from knowledge bases and ontologies, is increasingly made available following open linked data principles, at best as RDF triple data. This is a necessary step towards unified access to biological data sets, but this still requires solutions to query multiple endpoints for their heterogeneous data to eventually retrieve all the meaningful information. Suggested solutions are based on query federation approaches, which require the submission of SPARQL queries to endpoints. Due to the size and complexity of available data, these solutions have to be optimised for efficient retrieval times and for users in life sciences research. Last but not least, over time, the reliability of data resources in terms of access and quality have to be monitored. Our solution (BioFed) federates data over 130 SPARQL endpoints in life sciences and tailors query submission according to the provenance information. BioFed has been evaluated against the state of the art solution FedX and forms an important benchmark for the life science domain.

Authors

  • Ali Hasnain
    Insight Centre for Data Analytics, National University of Ireland (NUIG), Galway, Ireland. ali.hasnain@insight-centre.org.
  • Qaiser Mehmood
    Insight Centre for Data Analytics, NUIG, Galway, Ireland.
  • Syeda Sana E Zainab
    Insight Centre for Data Analytics, National University of Ireland (NUIG), Galway, Ireland.
  • Muhammad Saleem
    AKSW, University of Leipzig, Leipzig, Germany.
  • Claude Warren
    IBM, IDA Business Park, Galway, Ireland.
  • Durre Zehra
    Insight Centre for Data Analytics, National University of Ireland (NUIG), Galway, Ireland.
  • Stefan Decker
    Institute for Biomedical Informatics, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany.
  • Dietrich Rebholz-Schuhmann