Deep representation learning for clustering longitudinal survival data from electronic health records.

Journal: Nature communications
PMID:

Abstract

Precision medicine requires accurate identification of clinically relevant patient subgroups. Electronic health records provide major opportunities for leveraging machine learning approaches to uncover novel patient subgroups. However, many existing approaches fail to adequately capture complex interactions between diagnosis trajectories and disease-relevant risk events, leading to subgroups that can still display great heterogeneity in event risk and underlying molecular mechanisms. To address this challenge, we implemented VaDeSC-EHR, a transformer-based variational autoencoder for clustering longitudinal survival data as extracted from electronic health records. We show that VaDeSC-EHR outperforms baseline methods on both synthetic and real-world benchmark datasets with known ground-truth cluster labels. In an application to Crohn's disease, VaDeSC-EHR successfully identifies four distinct subgroups with divergent diagnosis trajectories and risk profiles, revealing clinically and genetically relevant factors in Crohn's disease. Our results show that VaDeSC-EHR can be a powerful tool for discovering novel patient subgroups in the development of precision medicine approaches.

Authors

  • Jiajun Qiu
    Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Garching, 85748, Germany. Electronic address: jiajunqiu@hotmail.com.
  • Yao Hu
    Key Laboratory of Crop Ecophysiology and Farming System in Southwest, Ministry of Agriculture, Chengdu, China.
  • Li Li
    Department of Gastric Surgery, Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, Affiliated Cancer Hospital of University of Electronic Science and Technology of China, Chengdu, China.
  • Abdullah Mesut Erzurumluoglu
    Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany.
  • Ingrid Braenne
    Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany.
  • Charles Whitehurst
    Immunology & Respiratory Diseases, Boehringer-Ingelheim, Ridgefield, CT, USA.
  • Jochen Schmitz
    Immunology & Respiratory Diseases, Boehringer-Ingelheim, Ridgefield, CT, USA.
  • Jatin Arora
    Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany.
  • Boris Alexander Bartholdy
    Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany.
  • Shrey Gandhi
    Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany.
  • Pierre Khoueiry
    Department of Biochemistry and Molecular Genetics, Faculty of Medicine, American University of Beirut, PO Box 11-0236 Beirut, Lebanon.
  • Stefanie Mueller
    Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany.
  • Boris Noyvert
    Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany.
  • Zhihao Ding
    Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany.
  • Jan Nygaard Jensen
    Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riβ, Germany.
  • Johann de Jong
    Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharma GmbH & Co. KG, Ingelheim 55216, Germany.