De-identification of patient notes with recurrent neural networks.

Journal: Journal of the American Medical Informatics Association : JAMIA
Published Date:

Abstract

OBJECTIVE: Patient notes in electronic health records (EHRs) may contain critical information for medical investigations. However, the vast majority of medical investigators can only access de-identified notes, in order to protect the confidentiality of patients. In the United States, the Health Insurance Portability and Accountability Act (HIPAA) defines 18 types of protected health information that needs to be removed to de-identify patient notes. Manual de-identification is impractical given the size of electronic health record databases, the limited number of researchers with access to non-de-identified notes, and the frequent mistakes of human annotators. A reliable automated de-identification system would consequently be of high value.

Authors

  • Franck Dernoncourt
    Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA; Adobe Research, San Jose, CA, USA.
  • Ji Young Lee
  • Ozlem Uzuner
    Department of Information Studies, University at Albany, SUNY. Albany, NY.
  • Peter Szolovits
    Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA.