The machine giveth and the machine taketh away: a parrot attack on clinical text deidentified with hiding in plain sight.

Journal: Journal of the American Medical Informatics Association : JAMIA
PMID:

Abstract

OBJECTIVE: Clinical corpora can be deidentified using a combination of machine-learned automated taggers and hiding in plain sight (HIPS) resynthesis. The latter replaces detected personally identifiable information (PII) with random surrogates, allowing leaked PII to blend in or "hide in plain sight." We evaluated the extent to which a malicious attacker could expose leaked PII in such a corpus.

Authors

  • David S Carrell
    Group Health Research Institute, Seattle, WA, 98101, USA.
  • David J Cronkite
    Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA.
  • Muqun Rachel Li
    Privacy Analytics Inc, Ottawa, Ontario, Canada.
  • Steve Nyemba
    Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
  • Bradley A Malin
    Vanderbilt University, Nashville, TN.
  • John S Aberdeen
    The MITRE Corp, Bedford, Massachusetts, USA.
  • Lynette Hirschman
    The MITRE Corporation, Bedford, MA, USA lynette@mitre.org.