Is Multiclass Automatic Text De-Identification Worth the Effort?
Journal:
Methods of information in medicine
Published Date:
Sep 24, 2018
Abstract
OBJECTIVES: Automatic de-identification to remove protected health information (PHI) from clinical text can use a "binary" model that replaces redacted text with a generic tag (e.g., ""), or can use a "multiclass" model that retains more class information (e.g., ""). Binary models are easier to develop, but result in text that is potentially less informative. We investigated whether building a multiclass de-identification is worth the extra effort.