De-identifying free text of Japanese electronic health records.

Journal: Journal of biomedical semantics
Published Date:

Abstract

BACKGROUND: Recently, more electronic data sources are becoming available in the healthcare domain. Electronic health records (EHRs), with their vast amounts of potentially available data, can greatly improve healthcare. Although EHR de-identification is necessary to protect personal information, automatic de-identification of Japanese language EHRs has not been studied sufficiently. This study was conducted to raise de-identification performance for Japanese EHRs through classic machine learning, deep learning, and rule-based methods, depending on the dataset.

Authors

  • Kohei Kajiyama
    Faculty of Informatics, Shizuoka University, Johoku 3-5-1, Naka-ku, Hamamatsu, Shizuoka, 432-8011, Japan.
  • Hiromasa Horiguchi
    National Hospital Organization Headquaters, 2-5-21 Higashigaoka, Meguro-ku, Tokyo, 152-8621, Japan.
  • Takashi Okumura
    Kitami Institute of Technology, Kitami, Hokkaido, Japan.
  • Mizuki Morita
    Okayama University, Okayama, Japan.
  • Yoshinobu Kano
    Faculty of Informatics Shizuoka University Hamamatsu Shizuoka Japan.