Evaluating LLMs' Potential to Identify Rare Patient Identifiers in Patient Health Records.

Journal: Studies in health technology and informatics

Published Date: May 15, 2025

Abstract

This study explores the utility of Large Language Models (LLMs) to support finding rare patient record details that could make a patient identifiable. Whilst most research has focused on what we call direct patient identifiers, indirect patient identifiers are not widely addressed. Our evaluation of patient records with mentions of indirect risks predicted by our LLM shows the potential to find these risks automatically. However, many risks highlighted were false positives or did not constitute identifiable risk. More work is needed to understand how we can harness the potential of LLMs as part of our de-identification pipelines for patient health records. Better de-identification of health records is important for safely improving data access and advancing research without compromising confidentiality.

Authors

Matúš Falis

University of Edinburgh, Edinburgh.
Franz Gruber

University of Edinburgh, Edinburgh.
Samuel McInerney

University of Edinburgh, Edinburgh.
Arlene Casey

School of Literatures, Languages and Cultures (LLC), University of Edinburgh, Edinburgh, Scotland, UK.

Keywords

Confidentiality Data Anonymization Electronic Health Records Humans Natural Language Processing Patient Identification Systems Programming Languages

External Resources

View on PubMed Access via DOI PubMed (40380594)

Evaluating LLMs' Potential to Identify Rare Patient Identifiers in Patient Health Records.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals