Application of unsupervised deep learning algorithms for identification of specific clusters of chronic cough patients from EMR data.

Journal: BMC bioinformatics
Published Date:

Abstract

BACKGROUND: Chronic cough affects approximately 10% of adults. The lack of ICD codes for chronic cough makes it challenging to apply supervised learning methods to predict the characteristics of chronic cough patients, thereby requiring the identification of chronic cough patients by other mechanisms. We developed a deep clustering algorithm with auto-encoder embedding (DCAE) to identify clusters of chronic cough patients based on data from a large cohort of 264,146 patients from the Electronic Medical Records (EMR) system. We constructed features using the diagnosis within the EMR, then built a clustering-oriented loss function directly on embedded features of the deep autoencoder to jointly perform feature refinement and cluster assignment. Lastly, we performed statistical analysis on the identified clusters to characterize the chronic cough patients compared to the non-chronic cough patients.

Authors

  • Wei Shao
  • Xiao Luo
    Department of Spine Surgery, The Third Hospital of Mianyang, Sichuan Mental Health Center, Mianyang, China.
  • Zuoyi Zhang
    Indiana University School of Medicine, 340W 10th St #6200, Indianapolis, IN 46202, United States. Electronic address: zyizhang@indiana.edu.
  • Zhi Han
    School of Microelectronics, Southeast University, Wuxi 214135, China. 220153639@seu.edu.cn.
  • Vasu Chandrasekaran
    Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States. Electronic address: vasu.chandrasekaran@merck.com.
  • Vladimir Turzhitsky
    Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States. Electronic address: vladimir.turzhitsky@merck.com.
  • Vishal Bali
    Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States. Electronic address: vishal.bali@merck.com.
  • Anna R Roberts
    Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States. Electronic address: annarobe@regenstrief.org.
  • Megan Metzger
    Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States. Electronic address: mmw@iu.edu.
  • Jarod Baker
    Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States. Electronic address: bakerjar@regenstrief.org.
  • Carmen La Rosa
    Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States. Electronic address: carmen.la.rosa@merck.com.
  • Jessica Weaver
    Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States. Electronic address: jessica.weaver@merck.com.
  • Paul Dexter
    Regenstrief Institute Inc., Indianapolis, IN.
  • Kun Huang
    Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA. Kun.Huang@osumc.edu.