Task definition, annotated dataset, and supervised natural language processing models for symptom extraction from unstructured clinical notes.

Journal: Journal of biomedical informatics
Published Date:

Abstract

INTRODUCTION: Machine learning (ML) and natural language processing have great potential to improve information extraction (IE) within electronic medical records (EMRs) for a wide variety of clinical search and summarization tools. Despite ML advancements, clinical adoption of real time IE tools for patient care remains low. Clinically motivated IE task definitions, publicly available annotated clinical datasets, and inclusion of subtasks such as coreference resolution and named entity normalization are critical for the development of useful clinical tools.

Authors

  • Jackson M Steinkamp
    Department of Radiology, Hospital of the University of Pennsylvania, 3400 Spruce St, Philadelphia, PA 19104 (J.M.S., T.P., J.A., C.E.K., T.S.C.); and Boston University School of Medicine, Boston, Mass (J.M.S.).
  • Wasif Bala
    Boston University School of Medicine, Boston, MA 02215, United States.
  • Abhinav Sharma
    Department of Biological Sciences and Bioengineering (BSBE), IIT, Kanpur, India.
  • Jacob J Kantrowitz
    Internal Medicine, Kent Hospital, Brown University Alpert Medical School, Warwick, RI, 02886, United States.