The Markyt visualisation, prediction and benchmark platform for chemical and gene entity recognition at BioCreative/CHEMDNER challenge.

Journal: Database : the journal of biological databases and curation
Published Date:

Abstract

Biomedical text mining methods and technologies have improved significantly in the last decade. Considerable efforts have been invested in understanding the main challenges of biomedical literature retrieval and extraction and proposing solutions to problems of practical interest. Most notably, community-oriented initiatives such as the BioCreative challenge have enabled controlled environments for the comparison of automatic systems while pursuing practical biomedical tasks. Under this scenario, the present work describes the Markyt Web-based document curation platform, which has been implemented to support the visualisation, prediction and benchmark of chemical and gene mention annotations at BioCreative/CHEMDNER challenge. Creating this platform is an important step for the systematic and public evaluation of automatic prediction systems and the reusability of the knowledge compiled for the challenge. Markyt was not only critical to support the manual annotation and annotation revision process but also facilitated the comparative visualisation of automated results against the manually generated Gold Standard annotations and comparative assessment of generated results. We expect that future biomedical text mining challenges and the text mining community may benefit from the Markyt platform to better explore and interpret annotations and improve automatic system predictions.Database URL: http://www.markyt.org, https://github.com/sing-group/Markyt.

Authors

  • Martin Pérez-Pérez
    ESEI - Department of Computer Science, University of Vigo, Ourense, Spain.
  • Gael Pérez-Rodríguez
    ESEI - Department of Computer Science, University of Vigo, Ourense, Spain.
  • Obdulia Rabal
    Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra, Pamplona, Spain.
  • Miguel Vazquez
    Structural Computational Biology Group, Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre, Madrid, Spain.
  • Julen Oyarzabal
    Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra, Pamplona, Spain.
  • Florentino Fdez-Riverola
    Computer Science Department, Universidad de Vigo, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain; CINBIO - Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310, Vigo, Spain.
  • Alfonso Valencia
    Barcelona Supercomputing Center (BSC.), Barcelona, Spain.
  • Martin Krallinger
    Structural Computational Biology Group, Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre, Madrid, Spain.
  • Anália Lourenço
    CEB - Centre of Biological Engineering, LIBRO - Laboratório de Investigação em Biofilmes Rosário Oliveira, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal; ESEI: Escuela Superior de Ingeniería Informática, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004 Ourense, Spain. Electronic address: analia@uvigo.es.