Decomposition feature selection with applications in detecting correlated biomarkers of bipolar disorders.

Journal: Statistics in medicine
Published Date:

Abstract

Feature selection is an important initial step of exploratory analysis in biomedical studies. Its main objective is to eliminate the covariates that are uncorrelated with the outcome. For highly correlated covariates, traditional feature selection methods, such as the Lasso, tend to select one of them and eliminate the others, although some of the eliminated ones are still scientifically valuable. To alleviate this drawback, we propose a feature selection method based on covariate space decomposition, referred herein as the "Decomposition Feature Selection" (DFS), and show that this method can lead to scientifically meaningful results in studies with correlated high dimensional data. The DFS consists of two steps: (i) decomposing the covariate space into disjoint subsets such that each of the subsets contains only uncorrelated covariates and (ii) identifying significant predictors by traditional feature selection within each covariate subset. We demonstrate through simulation studies that the DFS has superior practical performance over the Lasso type methods when multiple highly correlated covariates need to be retained. Application of the DFS is demonstrated through a study of bipolar disorders with correlated biomarkers.

Authors

  • Hailin Huang
    Department of Statistics, The George Washington University, Washington, District of Columbia.
  • Yuanzhang Li
    Division of Preventive Medicine, Walter Reed Army Institute of Research, Washington, District of Columbia.
  • Hua Liang
    Qilu Hospital of Shandong University, Department of Nephrology, Jinan, Shandong, China.
  • Colin O Wu
    From the Department of Radiology (B.A.-V.), Bloomberg School of Public Health (E.G.), and Department of Medicine, Cardiology and Radiology (J.A.C.L.), Johns Hopkins University, Baltimore, MD; George Washington University, DC (X.Y.); Office of Biostatistics, NHLBI, NIH, Bethesda, MD (C.O.W.); Department of Preventive Medicine, Northwestern University Medical School, Chicago, IL (K.L.); Department of Cardiology, Wake Forest University Health Sciences, Winston-Salem, NC (W.G.H.); Department of Biostatistics, University of Washington, Seattle (R.M.); Department of Radiology, UCLA School of Medicine, Los Angeles, CA (A.S.G.); Division of Epidemiology and Community Health, University of Minnesota, Minneapolis (A.R.F.); Departments of Medicine and Epidemiology, Columbia University, New York, NY (S.S.); and Radiology and Imaging Sciences, NIH Clinical Center, Bethesda, MD (D.A.B.).