A machine learning methodology for medical imaging anonymization.

Journal: Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
Published Date:

Abstract

Privacy protection is a major requirement for the complete success of EHR systems, becoming even more critical in collaborative scenarios, where data is shared among institutions and practitioners. While textual data can be easily de-identified, patient data in medical images implies a more elaborate approach. In this work we present a solution for sensitive word identification in medical images based on a combination of two machine-learning models, achieving a F1-score of 0.94. Three experts evaluated the system performance. They analyzed the output of the present methodology and categorized the studies in three groups: studies that had their sensitive words removed (true positive), studies with complete patient identity (false negative) and studies with mistakenly removed data (false positive). The experts were unanimous regarding the relevance of the present tool in collaborative medical environments, as it may improve the exchange of anonymized patient data between institutions.

Authors

  • Eriksson Monteiro
    Department of Electronics, Telecommunications and Informatics (DETI) IEETA - Institute of Electronics and Telematics Engineering of Aveiro (IEETA) University of Aveiro, Aveiro, Portugal.
  • Carlos Costa
    Department of Electronics, Telecommunications and Informatics (DETI) IEETA - Institute of Electronics and Telematics Engineering of Aveiro (IEETA) University of Aveiro, Aveiro, Portugal.
  • José Luís Oliveira
    Department of Electronics, Telecommunications and Informatics (DETI) IEETA - Institute of Electronics and Telematics Engineering of Aveiro (IEETA) University of Aveiro, Aveiro, Portugal.