Latent COVID-19 Clusters in Patients with Chronic Respiratory Conditions.

Journal: Studies in health technology and informatics
Published Date:

Abstract

The goal of this paper was to apply unsupervised machine learning techniques towards the discovery of latent COVID-19 clusters in patients with chronic lower respiratory diseases (CLRD). Patients who underwent testing for SARS-CoV-2 were identified from electronic medical records. The analytical dataset comprised 2,328 CLRD patients of whom 1,029 were tested COVID-19 positive. We used the factor analysis for mixed data method for preprocessing. It performed principle component analysis on numeric values and multiple correspondence analysis on categorical values which helped convert categorical data into numeric. Cluster analysis was an effective means to both distinguish subgroups of CLRD patients with COVID-19 as well as identify patient clusters which were adversely affected by the infection. Age, comorbidity index and race were important factors for cluster separations. Furthermore, diseases of the circulatory system, the nervous system and sense organs, digestive system, genitourinary system, metabolic diseases and immunity disorders were also important criteria in the resulting cluster analyses.

Authors

  • Wanting Cui
    Icahn School of Medicine at Mount Sinai, New York, NY, USA.
  • Manuel Cabrera
    Columbia University Irving Medical Center, NY, USA.
  • Joseph Finkelstein
    Department of Biomedical Informatics, School of Medicine, University of Utah, USA.