Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text.

Journal: Journal of medical Internet research
Published Date:

Abstract

BACKGROUND: The prevalence and value of patient-generated health text are increasing, but processing such text remains problematic. Although existing biomedical natural language processing (NLP) tools are appealing, most were developed to process clinician- or researcher-generated text, such as clinical notes or journal articles. In addition to being constructed for different types of text, other challenges of using existing NLP include constantly changing technologies, source vocabularies, and characteristics of text. These continuously evolving challenges warrant the need for applying low-cost systematic assessment. However, the primarily accepted evaluation method in NLP, manual annotation, requires tremendous effort and time.

Authors

  • Albert Park
    Department of Biomedical Informatics, School of Medicine University of Utah 421 Wakara Way Ste 140, Salt Lake City, UT 84108-3514, USA.
  • Andrea L Hartzler
    Group Health Research Institute, Seattle, WA.
  • Jina Huh
    Department of Media and Information, Michigan State University, East Lansing, MI.
  • David W McDonald
    Human Centered Design & Engineering, University of Washington, Seattle, WA.
  • Wanda Pratt
    Biomedical Informatics & Medical Education, University of Washington, Seattle, WA; Information School, University of Washington, Seattle, WA.