NEREL-BIO: a dataset of biomedical abstracts annotated with nested named entities.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: This article describes NEREL-BIO-an annotation scheme and corpus of PubMed abstracts in Russian and smaller number of abstracts in English. NEREL-BIO extends the general domain dataset NEREL by introducing domain-specific entity types. NEREL-BIO annotation scheme covers both general and biomedical domains making it suitable for domain transfer experiments. NEREL-BIO provides annotation for nested named entities as an extension of the scheme employed for NEREL. Nested named entities may cross entity boundaries to connect to shorter entities nested within longer entities, making them harder to detect.

Authors

  • Natalia Loukachevitch
    Lomonosov Moscow State University, Moscow 19899, Russia.
  • Suresh Manandhar
    Madan Bhandari University of Science and Technology, Chitlang 44600, Nepal.
  • Elina Baral
    Madan Bhandari University of Science and Technology, Chitlang 44600, Nepal.
  • Igor Rozhkov
    Lomonosov Moscow State University, Moscow 19899, Russia.
  • Pavel Braslavski
    Ural Federal University, Yekaterinburg 620002, Russia.
  • Vladimir Ivanov
    GNS Healthcare, Cambridge, MA, USA.
  • Tatiana Batura
    A.P. Ershov Institute of Informatics Systems, Novosibirsk 630090, Russia.
  • Elena Tutubalina
    Kazan (Volga Region) Federal University, Kazan, Russia.