Sortal anaphora resolution to enhance relation extraction from biomedical literature.

Journal: BMC bioinformatics
Published Date:

Abstract

BACKGROUND: Entity coreference is common in biomedical literature and it can affect text understanding systems that rely on accurate identification of named entities, such as relation extraction and automatic summarization. Coreference resolution is a foundational yet challenging natural language processing task which, if performed successfully, is likely to enhance such systems significantly. In this paper, we propose a semantically oriented, rule-based method to resolve sortal anaphora, a specific type of coreference that forms the majority of coreference instances in biomedical literature. The method addresses all entity types and relies on linguistic components of SemRep, a broad-coverage biomedical relation extraction system. It has been incorporated into SemRep, extending its core semantic interpretation capability from sentence level to discourse level.

Authors

  • Halil Kilicoglu
    School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL 61820, United States.
  • Graciela Rosemblat
    Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, 20894, MD, USA.
  • Marcelo Fiszman
    Lister Hill National Center for Biomedical Communications U.S. National Library of Medicine Bethesda, MD.
  • Thomas C Rindflesch
    National Library of Medicine, Bethesda, MD, USA.