Technical Note: An embedding-based medical note de-identification approach with sparse annotation.
Journal:
Medical physics
Published Date:
Feb 12, 2021
Abstract
PURPOSE: Medical note de-identification is critical for the protection of private information and the security of data sharing in collaborative research. The task demands the complete removal of all patient names and other sensitive information such as addresses and phone numbers from medical records. Accomplishing this goal is challenging, with many variations in the medical note formats and string representations. Existing de-identification approaches include pattern matching where extensive dictionary lists are constructed a prior; and entity tagging, which trains on a large word-wise annotated corpus. This motivates us to study an alternative to the existing approaches with a reduced annotation burden.