Knockoff boosted tree for model-free variable selection.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: The recently proposed knockoff filter is a general framework for controlling the false discovery rate (FDR) when performing variable selection. This powerful new approach generates a 'knockoff' of each variable tested for exact FDR control. Imitation variables that mimic the correlation structure found within the original variables serve as negative controls for statistical inference. Current applications of knockoff methods use linear regression models and conduct variable selection only for variables existing in model functions. Here, we extend the use of knockoffs for machine learning with boosted trees, which are successful and widely used in problems where no prior knowledge of model function is required. However, currently available importance scores in tree models are insufficient for variable selection with FDR control.

Authors

  • Tao Jiang
    Department of Respiratory and Critical Care Medicine, Center for Respiratory Medicine, the Fourth Affiliated Hospital of School of Medicine, and International School of Medicine, International Institutes of Medicine, Zhejiang University, Yiwu, China.
  • Yuanyuan Li
    Key Laboratory of Environment and Health, Ministry of Education & Ministry of Environmental Protection, State Key Laboratory of Environmental Health (Incubation), School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
  • Alison A Motsinger-Reif
    National Institute of Environmental Health Sciences, Biostatistics and Computational Biology Branch, Durham, NC 27713, USA.