Leveraging existing corpora for de-identification of psychiatric notes using domain adaptation.

Journal: AMIA ... Annual Symposium proceedings. AMIA Symposium

PMID: 29854175

Abstract

De-identification of clinical notes is a special case of named entity recognition. Supervised machine-learning (ML) algorithms have achieved promising results for this task. However, ML-based de-identification systems often require annotating a large number of clinical notes of interest, which is costly. Domain adaptation (DA) is a technology that enables learning from annotated datasets from different sources, thereby reducing annotation cost required for ML training in the target domain. In this study, we investigate the use of DA methods for deidentification of psychiatric notes. Three state-of-the-art DA methods: instance pruning, instance weighting, and feature augmentation are applied to three source corpora of annotated hospital discharge summaries, outpatient notes, and a mixture of different note types written for diabetic patients. Our results show that DA can increase deidentification performance over the baselines, indicating that it can effectively reduce annotation cost for the target psychiatric notes. Feature augmentation is shown to increase performance the most among the three DA methods. Performance variation among the different types of clinical notes is also observed, showing that a mixture of different types of notes brings the biggest increase in performance.

Authors

Hee-Jin Lee

University of Texas Health Science Center at Houston, Houston, TX.
Yaoyun Zhang

Alibaba Damo Academy, 969 West Wen Yi Road, Yu Hang District, Hangzhou, Zhejiang, China.
Kirk Roberts

The University of Texas Health Science Center at Houston, USA.
Hua Xu

Department of Urology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.

Keywords

Algorithms Data Anonymization Datasets as Topic Diabetes Mellitus Electronic Health Records Humans Machine Learning Natural Language Processing Outpatients Patient Discharge Summaries Psychiatry

External Resources

View on PubMed PubMed (29854175)

Leveraging existing corpora for de-identification of psychiatric notes using domain adaptation.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals