Evaluating LLMs' Potential to Identify Rare Patient Identifiers in Patient Health Records.
Journal:
Studies in health technology and informatics
Published Date:
May 15, 2025
Abstract
This study explores the utility of Large Language Models (LLMs) to support finding rare patient record details that could make a patient identifiable. Whilst most research has focused on what we call direct patient identifiers, indirect patient identifiers are not widely addressed. Our evaluation of patient records with mentions of indirect risks predicted by our LLM shows the potential to find these risks automatically. However, many risks highlighted were false positives or did not constitute identifiable risk. More work is needed to understand how we can harness the potential of LLMs as part of our de-identification pipelines for patient health records. Better de-identification of health records is important for safely improving data access and advancing research without compromising confidentiality.