Development and Validation of a Natural Language Processing Algorithm to Pseudonymize Documents in the Context of a Clinical Data Warehouse.

Journal: Methods of information in medicine
PMID:

Abstract

OBJECTIVE: The objective of this study is to address the critical issue of deidentification of clinical reports to allow access to data for research purposes, while ensuring patient privacy. The study highlights the difficulties faced in sharing tools and resources in this domain and presents the experience of the Greater Paris University Hospitals (AP-HP for Assistance Publique-Hôpitaux de Paris) in implementing a systematic pseudonymization of text documents from its Clinical Data Warehouse.

Authors

  • Xavier Tannier
    Sorbonne Université, Inserm, Univ Paris 13, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, 75006 Paris, France. Electronic address: xavier.tannier@sorbonne-universite.fr.
  • Perceval Wajsbürt
    Sorbonne Université, Inserm, Université Sorbonne Paris Nord, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé (LIMICS), 75006 Paris, France.
  • Alice Calliger
    Innovation and Data Unit, IT Department, Assistance Publique-Hôpitaux de Paris, Paris, France.
  • Basile Dura
    Innovation and Data Unit, IT Department, Assistance Publique-Hôpitaux de Paris, Paris, France.
  • Alexandre Mouchet
    Innovation and Data Unit, IT Department, Assistance Publique-Hôpitaux de Paris, Paris, France.
  • Martin Hilka
    Innovation and Data Unit, IT Department, Assistance Publique-Hôpitaux de Paris, Paris, France.
  • Romain Bey
    Centre of Research in Epidemiology and Statistics (CRESS), Université de Paris, French Institute of Health and Medical Research (INSERM), National Institute of Agricultural Research (INRA), Paris, France.