Smoking Classification Using Novel Plasma Cytokines by implementing Machine Learning and Statistical Methods.

Journal: Proceedings. International Conference on Computational Science and Computational Intelligence
Published Date:

Abstract

Smoking is a major cause of premature and preventable death. Tobacco exposure has a detrimental effect on many organs and contributes to multiple diseases including chronic obstructive pulmonary disease (COPD), cardiovascular disease, cancer, and diabetes. Cytokines are inflammatory biomarkers that are mechanistically associated with smoking. Machine Learning algorithms allow for the quantitative assessment of the contributions of individual cytokines to tobacco-related diseases. The mapping of cytokines to disease can facilitate and direct treatment modalities. By the application of k Nearest Neighbor (k-NN) and Random Forest machine learning algorithms on 63 plasma cytokines we have demonstrated the classification of smoking. To ensure optimal results, performance improvement techniques such as k-fold cross validation and hyper parameter tuning are employed. Separability efficiency achieved by the models is evaluated using the Area Under the Receiver Operating Characteristic (AUROC) metric. The most significant cytokines that enabled the classification are identified and presented. The statistically significant difference for AUROC score of k-NN and Random Forest has been ascertained using the 2-sample independent t test. A reasonably good classification performance was achieved by k-NN algorithm with an AUROC metric of .87, and a 95% CI of (.823,.917). Random forest exceeded k-NN algorithm's performance, with a perfect AUROC score of 1 and a 95% CI of (1,1). From among the ten most prominent cytokines that contributed to the classification, the ones common to both algorithms are: LIF, IL22, G-CSF/CSF-3, TRAIL. AUROC scores for k-NN and Random Forest are significantly different (p-value = 5.105e-16). The discovery and transference of biomarkers such as cytokines from the platform of molecular investigation to clinical practice, can facilitate precision medicine-based therapeutic interventions.

Authors

  • Seema Singh Saharan
    Department of Clinical Pharmacy University of California San Francisco, USA.
  • Pankaj Nagar
    Department of Statistics University of Rajasthan Jaipur, India.
  • Kate Townsend Creasy
    Cardiovascular Research Institute, Department of Medicine University of California San Francisco, USA.
  • Eveline O Stock
    Cardiovascular Research Institute, Department of Medicine University of California San Francisco, USA.
  • James Feng
    Cardiovascular Research Institute, Department of Medicine University of California San Francisco, USA.
  • Mary J Malloy
    Cardiovascular Research Institute, Department of Medicine University of California San Francisco, USA.
  • John P Kane
    Cardiovascular Research Institute, Department of Medicine University of California San Francisco, USA.

Keywords

No keywords available for this article.