Impact of De-Identification on Clinical Text Classification Using Traditional and Deep Learning Classifiers.

Journal: Studies in health technology and informatics
Published Date:

Abstract

Clinical text de-identification enables collaborative research while protecting patient privacy and confidentiality; however, concerns persist about the reduction in the utility of the de-identified text for information extraction and machine learning tasks. In the context of a deep learning experiment to detect altered mental status in emergency department provider notes, we tested several classifiers on clinical notes in their original form and on their automatically de-identified counterpart. We tested both traditional bag-of-words based machine learning models as well as word-embedding based deep learning models. We evaluated the models on 1,113 history of present illness notes. A total of 1,795 protected health information tokens were replaced in the de-identification process across all notes. The deep learning models had the best performance with accuracies of 95% on both original and de-identified notes. However, there was no significant difference in the performance of any of the models on the original vs. the de-identified notes.

Authors

  • Jihad S Obeid
    Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC 29425, United States.
  • Paul M Heider
    Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, USA.
  • Erin R Weeda
    Department of Clinical Pharmacy and Outcome Sciences, Medical University of South Carolina, Charleston, SC, USA.
  • Andrew J Matuskowitz
    Division of Cardiovascular Imaging, Department of Radiology and Radiological Science, Medical University of South Carolina, Ashley River Tower, 25 Courtenay Dr, Charleston, SC 29425-2260 (S.S.M., D.M., M.v.A., C.N.D.C., R.R.B., C.T., A.V.S., A.M.F., B.E.J., L.P.G., U.J.S.); Department of Diagnostic and Interventional Radiology, University Hospital Frankfurt, Frankfurt, Germany (S.S.M., T.J.V.); Stanford University School of Medicine, Department of Radiology, Stanford, Calif (D.M.); Division of Cardiothoracic Imaging, Nuclear Medicine and Molecular Imaging, Department of Radiology and Imaging Sciences, Emory University, Atlanta, Ga (C.N.D.C.); Division of Cardiology, Department of Medicine, Medical University of South Carolina, Charleston, SC (R.R.B.); Department of Cardiology and Intensive Care Medicine, Heart Center Munich-Bogenhausen, Munich, Germany (C.T.); Department of Cardiology, Munich University Clinic, Ludwig-Maximilians-University, Munich, Germany (C.T.); Siemens Medical Solutions USA, Malvern, Pa (P.S.); and Department of Emergency Medicine, Medical University of South Carolina, Charleston, SC (A.J.M.).
  • Christine M Carr
    Department of Emergency Medicine, Medical University of South Carolina, Charleston, SC, USA.
  • Kevin Gagnon
    Department of Computer Science, University of South Carolina, Columbia, SC, USA.
  • Tami Crawford
    Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, USA.
  • Stéphane M Meystre
    Department of Biomedical Informatics, University of Utah, Salt Lake City, USA.