Improving precision in concept normalization.

Journal: Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
Published Date:

Abstract

Most natural language processing applications exhibit a trade-off between precision and recall. In some use cases for natural language processing, there are reasons to prefer to tilt that trade-off toward high precision. Relying on the Zipfian distribution of false positive results, we describe a strategy for increasing precision, using a variety of both pre-processing and post-processing methods. They draw on both knowledge-based and frequentist approaches to modeling language. Based on an existing high-performance biomedical concept recognition pipeline and a previously published manually annotated corpus, we apply this hybrid rationalist/empiricist strategy to concept normalization for eight different ontologies. Which approaches did and did not improve precision varied widely between the ontologies.

Authors

  • Mayla Boguslav
    Computational Bioscience Program, University Colorado School of Medicine, Aurora, CO, USA.
  • K Bretonnel Cohen
    Computational Bioscience, University of Colorado School of Medicine, Aurora, CO 80045, USA.
  • William A Baumgartner
  • Lawrence E Hunter
    Computational Bioscience Program and Department of Pharmacology, University of Colorado Denver Anschutz Medical Campus, Aurora, Colorado 80045, USA.