CUILESS2016: a clinical corpus applying compositional normalization of text mentions.

Journal: Journal of biomedical semantics
Published Date:

Abstract

BACKGROUND: Traditionally text mention normalization corpora have normalized concepts to single ontology identifiers ("pre-coordinated concepts"). Less frequently, normalization corpora have used concepts with multiple identifiers ("post-coordinated concepts") but the additional identifiers have been restricted to a defined set of relationships to the core concept. This approach limits the ability of the normalization process to express semantic meaning. We generated a freely available corpus using post-coordinated concepts without a defined set of relationships that we term "compositional concepts" to evaluate their use in clinical text.

Authors

  • John D Osborne
    Center for Clinical and Translational Science, University of Alabama at Birmingham, Birmingham, Alabama, USA, 35294 ozborn@uab.edu.
  • Matthew B Neu
    University of Alabama at Birmingham, 7th Ave S, Birmingham, 1720, USA.
  • Maria I Danila
    University of Alabama at Birmingham, 7th Ave S, Birmingham, 1720, USA.
  • Thamar Solorio
    Computer Science Department, University of Houston, Düsternbrooker Weg 20, Houston, 24105, USA.
  • Steven J Bethard
    School of Information, University of Arizona, Tucson, 85721, USA. bethard@email.arizona.edu.