Feature Selection and Machine Learning Approaches in Prediction of Current E-Cigarette Use Among U.S. Adults in 2022.

Journal: International journal of environmental research and public health
PMID:

Abstract

Feature selection is essentially the process of picking informative and relevant features from a larger collection of features. Few studies have focused on predictors for current e-cigarette use among U.S. adults using feature selection and machine learning (ML) approaches. This study aimed to perform feature selection and develop ML approaches in prediction of current e-cigarette use using the 2022 Health Information National Trends Survey (HINTS 6). The Boruta algorithm and the least absolute shrinkage and selection operator (LASSO) were used to perform feature selection of 71 variables. The random oversampling example (ROSE) method was utilized to deal with imbalance data. Five ML tools including support vector machines (SVMs), logistic regression (LR), random forest (RF), gradient boosting machine (GBM), and extreme gradient boosting (XGBoost) were applied to develop ML models. The overall prevalence of current e-cigarette use was 4.3%. Using the overlapped 15 variables selected by Boruta and LASSO, the RF algorithm provided the best classifier with an accuracy of 0.992, sensitivity of 0.985, F1 score of 0.991, and AUC of 0.999. Weighted logistic regression further confirmed that age, education level, smoking status, belief in the harm of e-cigarette use, binge drinking, belief in alcohol increasing cancer, and the Patient Health Questionnaire-4 (PHQ4) score were associated with e-cigarette use. This study confirmed the strength of ML techniques in survey data, and the findings will guide inquiry into behaviors and mentalities of substance users.

Authors

  • Wei Fang
    GNSS Research Center, Wuhan University, Wuhan, 430079, China.
  • Ying Liu
    The First School of Clinical Medicine, Lanzhou University, Lanzhou, China.
  • Chun Xu
    Xinjiang University of Finance and Economics, Urmqi 830011, China.
  • Xingguang Luo
    Department of Psychiatry, Yale University School of Medicine, New Haven, CT 06516, USA.
  • Kesheng Wang
    School of Nursing, Health Sciences Center, West Virginia University, Morgantown, WV 26506, USA.