Is Multiclass Automatic Text De-Identification Worth the Effort?

Journal: Methods of information in medicine
Published Date:

Abstract

OBJECTIVES: Automatic de-identification to remove protected health information (PHI) from clinical text can use a "binary" model that replaces redacted text with a generic tag (e.g., ""), or can use a "multiclass" model that retains more class information (e.g., ""). Binary models are easier to develop, but result in text that is potentially less informative. We investigated whether building a multiclass de-identification is worth the extra effort.

Authors

  • Duy Duc An Bui
  • David T Redden
  • James J Cimino