A flexible two-stage anonymization framework for narrative medical records adapting to various language models.

Journal: Computers in biology and medicine

Published Date: Jun 23, 2025

Abstract

The healthcare sector increasingly relies on Electronic Health Records (EHRs) for efficient and high-quality patient care by providing rapid access to comprehensive medical information. However, these records contain sensitive patient data that must be protected, especially when transferred to cloud environments. Identifying and anonymizing this sensitive information is challenging due to its dispersion across multiple words or phrases in narrative unstructured text. To systematically detect and anonymize unstructured narrative digital medical records, a two-stage k-anonymization framework, combining natural language processing (NLP) methods and privacy-preserving techniques has been proposed in this study. The first stage is to extract the sensitive entities from narrative medical records according to identifiers predefined by existing privacy rules, and the second stage is to generate perturbed data that satisfies k-anonymity. Fine-tuned Bidirectional Encoder Representations from Transformer (BERT) models and prompt-driven Large Language Models (LLMs) were developed and customized in this framework. Experimental results demonstrate that our framework achieves high F1-scores of over 90 % across multiple entity types and the two-stage structure allows for dynamic adjustment of entity categories and anonymization strategies to comply with various privacy regulations. Recognizing the limitations of healthcare environments with minimal computational resources, the proposed framework was optimized for deployment on standard consumer-grade computers with widely available GPUs by using Low-Rank Adaptation (LoRA) instead of full fine-tuning to reduce memory consumption, making it suitable for both large-scale and resource-constrained environments.

Authors

Jing Jia

School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China. s20160598@xs.ustb.edu.cn.
Hiroaki Nishi

Department of System Design, Faculty of Science and Technology, Keio University, 3-14-1, Hiyoshi, Kohoku-ku, Yokohama, 223-8522, Japan.

Keywords

Data Anonymization Electronic Health Records Humans Natural Language Processing

External Resources

View on PubMed Access via DOI PubMed (40554980)

A flexible two-stage anonymization framework for narrative medical records adapting to various language models.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

A flexible two-stage anonymization framework for narrative medical records adapting to various language models.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals