Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources.

Journal: Journal of the American Medical Informatics Association : JAMIA
Published Date:

Abstract

OBJECTIVE: Analysis of narrative (text) data from electronic health records (EHRs) can improve population-scale phenotyping for clinical and genetic research. Currently, selection of text features for phenotyping algorithms is slow and laborious, requiring extensive and iterative involvement by domain experts. This paper introduces a method to develop phenotyping algorithms in an unbiased manner by automatically extracting and selecting informative features, which can be comparable to expert-curated ones in classification accuracy.

Authors

  • Sheng Yu
    Medical College, Guangxi University of Science and Technology, Liuzhou, Guangxi, 545005, China.
  • Katherine P Liao
    Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.
  • Stanley Y Shaw
    Massachusetts General Hospital, Boston, MA.
  • Vivian S Gainer
    Research Computing, Partners HealthCare, Charlestown, MA, USA.
  • Susanne E Churchill
    Research Computing, Partners HealthCare, Charlestown, MA, USA.
  • Peter Szolovits
    Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
  • Shawn N Murphy
  • Isaac S Kohane
    Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA. Isaac_Kohane@hms.harvard.edu.
  • Tianxi Cai
    Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States.