Interpretable Probabilistic Latent Variable Models for Automatic Annotation of Clinical Text.

Journal: AMIA ... Annual Symposium proceedings. AMIA Symposium
Published Date:

Abstract

We propose Latent Class Allocation (LCA) and Discriminative Labeled Latent Dirichlet Allocation (DL-LDA), two novel interpretable probabilistic latent variable models for automatic annotation of clinical text. Both models separate the terms that are highly characteristic of textual fragments annotated with a given set of labels from other non-discriminative terms, but rely on generative processes with different structure of latent variables. LCA directly learns class-specific multinomials, while DL-LDA breaks them down into topics (clusters of semantically related words). Extensive experimental evaluation indicates that the proposed models outperform Naïve Bayes, a standard probabilistic classifier, and Labeled LDA, a state-of-the-art topic model for labeled corpora, on the task of automatic annotation of transcripts of motivational interviews, while the output of the proposed models can be easily interpreted by clinical practitioners.

Authors

  • Alexander Kotov
    Department of Computer Science, Wayne State University.
  • Mehedi Hasan
    Department of Computer Science, Wayne State University.
  • April Carcone
    Pediatric Prevention Research Center, Wayne State University.
  • Ming Dong
    Department of Computer Science, Wayne State University.
  • Sylvie Naar-King
    Pediatric Prevention Research Center, Wayne State University.
  • Kathryn BroganHartlieb
    Department of Dietetics and Nutrition, Florida International University.