Weakly Semi-supervised phenotyping using Electronic Health records.

Journal: Journal of biomedical informatics
Published Date:

Abstract

OBJECTIVE: Electronic Health Record (EHR) based phenotyping is a crucial yet challenging problem in the biomedical field. Though clinicians typically determine patient-level diagnoses via manual chart review, the sheer volume and heterogeneity of EHR data renders such tasks challenging, time-consuming, and prohibitively expensive, thus leading to a scarcity of clinical annotations in EHRs. Weakly supervised learning algorithms have been successfully applied to various EHR phenotyping problems, due to their ability to leverage information from large quantities of unlabeled samples to better inform predictions based on a far smaller number of patients. However, most weakly supervised methods are subject to the challenge to choose the right cutoff value to generate an optimal classifier. Furthermore, since they only utilize the most informative features (i.e., main ICD and NLP counts) they may fail for episodic phenotypes that cannot be consistently detected via ICD and NLP data. In this paper, we propose a label-efficient, weakly semi-supervised deep learning algorithm for EHR phenotyping (WSS-DL), which overcomes the limitations above.

Authors

  • Isabelle-Emmanuella Nogues
    Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA.
  • Jun Wen
    School of Pharmacy, Second Military Medical University, Shanghai, 200433, China.
  • Yucong Lin
    Center for Statistical Science, Tsinghua University, Beijing, Beijing, China; Department of Industrial Engineering, Tsinghua University, Beijing, Beijing, China.
  • Molei Liu
    Department of Biostatistics, Harvard Chan School of Public Health, 677 Huntington Avenue, Boston, Massachusetts 02115, U.S.A.
  • Sara K Tedeschi
    Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts.
  • Alon Geva
    Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.
  • Tianxi Cai
    Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States.
  • Chuan Hong
    Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.