Do Neural Information Extraction Algorithms Generalize Across Institutions?

Journal: JCO clinical cancer informatics
Published Date:

Abstract

PURPOSE: Natural language processing (NLP) techniques have been adopted to reduce the curation costs of electronic health records. However, studies have questioned whether such techniques can be applied to data from previously unseen institutions. We investigated the performance of a common neural NLP algorithm on data from both known and heldout (ie, institutions whose data were withheld from the training set and only used for testing) hospitals. We also explored how diversity in the training data affects the system's generalization ability.

Authors

  • Enrico Santus
    Department of Electrical Engineering and Computer Science, CSAIL, MIT, Cambridge, Massachusetts, United States of America.
  • Clara Li
    Department of Electrical Engineering and Computer Science, CSAIL, MIT, Cambridge, USA.
  • Adam Yala
    Department of Electrical Engineering and Computer Science, CSAIL, MIT, Cambridge, USA.
  • Donald Peck
    Henry Ford Health System, Detroit, MI.
  • Rufina Soomro
    Liaquat National Hospital & Medical College, Karachi, Pakistan.
  • Naveen Faridi
    Liaquat National Hospital & Medical College, Karachi, Pakistan.
  • Isra Mamshad
    Liaquat National Hospital & Medical College, Karachi, Pakistan.
  • Rong Tang
    Division of Surgical Oncology, MGH, Boston, USA.
  • Conor R Lanahan
    Massachusetts General Hospital, Boston, MA.
  • Regina Barzilay
    Computer Science and Artificial Intelligence Laboratory , Massachusetts Institute of Technology , 77 Massachusetts Avenue , Cambridge , MA 02139 , USA . Email: regina@csail.mit.edu.
  • Kevin Hughes
    Division of Surgical Oncology, MGH, Boston, USA.