Learning statistical models of phenotypes using noisy labeled training data.

Journal: Journal of the American Medical Informatics Association : JAMIA
PMID:

Abstract

OBJECTIVE: Traditionally, patient groups with a phenotype are selected through rule-based definitions whose creation and validation are time-consuming. Machine learning approaches to electronic phenotyping are limited by the paucity of labeled training datasets. We demonstrate the feasibility of utilizing semi-automatically labeled training sets to create phenotype models via machine learning, using a comprehensive representation of the patient medical record.

Authors

  • Vibhu Agarwal
    Biomedical Informatics Training Program, Stanford University, Stanford CA 94305-5479, USA vibhua@stanford.edu.
  • Tanya Podchiyska
    Stanford Center for Clinical Informatics, Stanford University, Stanford, CA, US.
  • Juan M Banda
    Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA.
  • Veena Goel
    Department of Pediatrics, Stanford University School of Medicine, Stanford CA 94305-5208, USA.
  • Tiffany I Leung
    Division of General Medical Disciplines, Stanford University, Stanford CA 94305, USA.
  • Evan P Minty
    Biomedical Informatics Training Program, Stanford University, Stanford CA 94305-5479, USA.
  • Timothy E Sweeney
    Biomedical Informatics Training Program, Stanford University, Stanford CA 94305-5479, USA.
  • Elsie Gyang
    Division of Vascular Surgery, Stanford Hospital & Clinics, Stanford CA 94305-5642, USA.
  • Nigam H Shah
    Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA.