A study of deep learning methods for de-identification of clinical notes in cross-institute settings.

Journal: BMC medical informatics and decision making
Published Date:

Abstract

BACKGROUND: De-identification is a critical technology to facilitate the use of unstructured clinical text while protecting patient privacy and confidentiality. The clinical natural language processing (NLP) community has invested great efforts in developing methods and corpora for de-identification of clinical notes. These annotated corpora are valuable resources for developing automated systems to de-identify clinical text at local hospitals. However, existing studies often utilized training and test data collected from the same institution. There are few studies to explore automated de-identification under cross-institute settings. The goal of this study is to examine deep learning-based de-identification methods at a cross-institute setting, identify the bottlenecks, and provide potential solutions.

Authors

  • Xi Yang
    Department of Health Outcomes and Biomedical Informatics.
  • Tianchen Lyu
    Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Clinical and Translational Research Building 2004 Mowry Road, PO Box 100177, Gainesville, Florida, USA.
  • Qian Li
    Emergency and Critical Care Center, Department of Emergency Medicine, Zhejiang Provincial People's Hospital, People's Hospital of Hangzhou Medical College, Hangzhou, Zhejiang, China.
  • Chih-Yin Lee
    Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Clinical and Translational Research Building 2004 Mowry Road, PO Box 100177, Gainesville, Florida, USA.
  • Jiang Bian
    Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, United States of America.
  • William R Hogan
    Department of Health Outcomes and Biomedical Informatics.
  • Yonghui Wu
    Department of Health Outcomes and Biomedical Informatics.