sureLDA: A multidisease automated phenotyping method for the electronic health record.

Journal: Journal of the American Medical Informatics Association : JAMIA
Published Date:

Abstract

OBJECTIVE: A major bottleneck hindering utilization of electronic health record data for translational research is the lack of precise phenotype labels. Chart review as well as rule-based and supervised phenotyping approaches require laborious expert input, hampering applicability to studies that require many phenotypes to be defined and labeled de novo. Though International Classification of Diseases codes are often used as surrogates for true labels in this setting, these sometimes suffer from poor specificity. We propose a fully automated topic modeling algorithm to simultaneously annotate multiple phenotypes.

Authors

  • Yuri Ahuja
    Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA.
  • Doudou Zhou
    Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.
  • Zeling He
    Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.
  • Jiehuan Sun
    Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA.
  • Victor M Castro
  • Vivian Gainer
  • Shawn N Murphy
  • Chuan Hong
    Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
  • Tianxi Cai
    Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States.