De-identification of clinical free text using natural language processing: A systematic review of current approaches.

Journal: Artificial intelligence in medicine
Published Date:

Abstract

BACKGROUND: Electronic health records (EHRs) are a valuable resource for data-driven medical research. However, the presence of protected health information (PHI) makes EHRs unsuitable to be shared for research purposes. De-identification, i.e. the process of removing PHI is a critical step in making EHR data accessible. Natural language processing has repeatedly demonstrated its feasibility in automating the de-identification process.

Authors

  • Aleksandar Kovacevic
    Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia.
  • Bojana Bašaragin
    The Institute for Artificial Intelligence Research and Development of Serbia, Fruškogorska 1, 21000 Novi Sad, Serbia. Electronic address: bojana.basaragin@ivi.ac.rs.
  • Nikola Milošević
    The Institute for Artificial Intelligence Research and Development of Serbia, Fruškogorska 1, 21000 Novi Sad, Serbia; Bayer A.G., Research and Development, Mullerstrasse 173, Berlin 13342, Germany.
  • Goran Nenadic
    School of Computer Science, University of Manchester, Manchester, UK.