De-identification of Clinical Text via Bi-LSTM-CRF with Neural Language Models.

Journal: AMIA ... Annual Symposium proceedings. AMIA Symposium

Published Date: Mar 4, 2020

Abstract

De-identification of clinical text, the prerequisite of electronic clinical data reuse, is a typical named entity recogni tion (NER) problem. A number of state-of-the-art deep learning methods for NER, such as Bi-LSTM-CRF (bidirec tional long-short-term-memory conditional random fields), have been applied for de-identification. Neural language models used for language representation bring great improvement in lots of NLP tasks when they are integrated with other deep learning methods. In this paper, we introduce Bi-LSTM-CRF with neural language models for de- identification of clinical text, and evaluate it on the de-identification datasets of the i2b2 2014 and the CEGS N- GRID 2016 challenges. Four neural language models of three types individually integrated with Bi-LSTM-CRF are compared in this study. Bi-LSTM-CRF with neural language models achieves the highest "strict" micro-averaged F1-score of 95.50% on the i2b2 2014 dataset and 91.82% on the CEGS N-GRID 2016 dataset, becoming new benchmark results on these two datasets respectively De-identification, Named entity recognition, Bidirectional long-short-term-memory, Conditional ran dom fields, Neural language models.

Authors

Buzhou Tang
Dehuan Jiang

Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Tech-nology, Shenzhen, China.
Qingcai Chen

Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.
Xiaolong Wang

Cardiovascular Department, Shuguang Hospital Affiliated to Shanghai University of TCM Shanghai, China.
Jun Yan

Department of Statistics, University of Connecticut, Storrs, CT 06269, USA.
Ying Shen

Keywords

Data Anonymization Deep Learning Language Natural Language Processing Neural Networks, Computer

External Resources

View on PubMed PubMed (32308882)

De-identification of Clinical Text via Bi-LSTM-CRF with Neural Language Models.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals