Identifying protected health information by transformers-based deep learning approach in Chinese medical text.

Journal: Health informatics journal

PMID: 39862116

Abstract

In the context of Chinese clinical texts, this paper aims to propose a deep learning algorithm based on Bidirectional Encoder Representation from Transformers (BERT) to identify privacy information and to verify the feasibility of our method for privacy protection in the Chinese clinical context. We collected and double-annotated 33,017 discharge summaries from 151 medical institutions on a municipal regional health information platform, developed a BERT-based Bidirectional Long Short-Term Memory Model (BiLSTM) and Conditional Random Field (CRF) model, and tested the performance of privacy identification on the dataset. To explore the performance of different substructures of the neural network, we created five additional baseline models and evaluated the impact of different models on performance. Based on the annotated data, the BERT model pre-trained with the medical corpus showed a significant performance improvement to the BiLSTM-CRF model with a micro-recall of 0.979 and an F1 value of 0.976, which indicates that the model has promising performance in identifying private information in Chinese clinical texts. The BERT-based BiLSTM-CRF model excels in identifying privacy information in Chinese clinical texts, and the application of this model is very effective in protecting patient privacy and facilitating data sharing.

Authors

Kun Xu

Department of Hygienic Inspection, School of Public Health, Jilin University 1163 Xinmin Street Changchun 130021 Jilin China songxiuling@jlu.edu.cn li_juan@jlu.edu.cn jinmh@jlu.edu.cn +86 43185619441.
Yang Song

Biomedical and Multimedia Information Technology (BMIT) Research Group, School of IT, University of Sydney, NSW 2006, Australia. Electronic address: yson1723@uni.sydney.edu.au.
Jingdong Ma

School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Hubei, China. Electronic address: jdma@hust.edu.cn.

Keywords

Algorithms China Computer Security Confidentiality Deep Learning Electronic Health Records Humans Neural Networks, Computer

External Resources

View on PubMed Access via DOI PubMed (39862116)

Identifying protected health information by transformers-based deep learning approach in Chinese medical text.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals