Identification of Patients with Family History of Pancreatic Cancer--Investigation of an NLP System Portability.

Journal: Studies in health technology and informatics
Published Date:

Abstract

In this study we have developed a rule-based natural language processing (NLP) system to identify patients with family history of pancreatic cancer. The algorithm was developed in a Unstructured Information Management Architecture (UIMA) framework and consisted of section segmentation, relation discovery, and negation detection. The system was evaluated on data from two institutions. The family history identification precision was consistent across the institutions shifting from 88.9% on Indiana University (IU) dataset to 87.8% on Mayo Clinic dataset. Customizing the algorithm on the the Mayo Clinic data, increased its precision to 88.1%. The family member relation discovery achieved precision, recall, and F-measure of 75.3%, 91.6% and 82.6% respectively. Negation detection resulted in precision of 99.1%. The results show that rule-based NLP approaches for specific information extraction tasks are portable across institutions; however customization of the algorithm on the new dataset improves its performance.

Authors

  • Saeed Mehrabi
    Secure Exchange Solution, Rockville, MD.
  • Anand Krishnan
    Centre for Community Medicine, All India Institute of Medical Sciences, New Delhi, India.
  • Alexandra M Roch
    Department of Surgery, Indiana University, Indianapolis, IN.
  • Heidi Schmidt
    University Health Network-Princess Margaret Cancer Centre and Toronto General Hospital, Toronto, Ontario, Canada.
  • Dingcheng Li
    These authors contributed equally to this study and Dr. Li is now working at IBM; Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.
  • Joe Kesterson
    Regenstrief Institute Inc., Indianapolis, IN.
  • Chris Beesley
    Regenstrief Institute Inc., Indianapolis, IN.
  • Paul Dexter
    Regenstrief Institute Inc., Indianapolis, IN.
  • Max Schmidt
    Department of Surgery, Indiana University, Indianapolis, IN.
  • Mathew Palakal
    School of Informatics and Computing, Indiana University, Indianapolis, IN.
  • Hongfang Liu
    Department of Artificial Intelligence & Informatics, Mayo Clinic, Rochester, MN, United States.