A machine learning based approach to identify protected health information in Chinese clinical text.

Journal: International journal of medical informatics
Published Date:

Abstract

BACKGROUND: With the increasing application of electronic health records (EHRs) in the world, protecting private information in clinical text has drawn extensive attention from healthcare providers to researchers. De-identification, the process of identifying and removing protected health information (PHI) from clinical text, has been central to the discourse on medical privacy since 2006. While de-identification is becoming the global norm for handling medical records, there is a paucity of studies on its application on Chinese clinical text. Without efficient and effective privacy protection algorithms in place, the use of indispensable clinical information would be confined.

Authors

  • Liting Du
    School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Hubei, China.
  • Chenxi Xia
    School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Hubei, China.
  • Zhaohua Deng
    School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Hubei, China.
  • Gary Lu
    Dassault Systems, 175 Wyman St. Waltham, MA, 02451, USA.
  • Shuxu Xia
    School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Hubei, China.
  • Jingdong Ma
    School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Hubei, China. Electronic address: jdma@hust.edu.cn.