DeepLINK: Deep learning inference using knockoffs with applications to genomics.

Journal: Proceedings of the National Academy of Sciences of the United States of America
Published Date:

Abstract

We propose a deep learning-based knockoffs inference framework, DeepLINK, that guarantees the false discovery rate (FDR) control in high-dimensional settings. DeepLINK is applicable to a broad class of covariate distributions described by the possibly nonlinear latent factor models. It consists of two major parts: an autoencoder network for the knockoff variable construction and a multilayer perceptron network for feature selection with the FDR control. The empirical performance of DeepLINK is investigated through extensive simulation studies, where it is shown to achieve FDR control in feature selection with both high selection power and high prediction accuracy. We also apply DeepLINK to three real data applications to demonstrate its practical utility.

Authors

  • Zifan Zhu
    Quantitative and Computational Biology Department, University of Southern California, Los Angeles, CA 90089.
  • Yingying Fan
    Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089; fanyingy@usc.edu fsun@usc.edu.
  • Yinfei Kong
    Mihaylo College of Business and Economics, California State University Fullerton, Fullerton, CA 92831, USA. yikong@fullerton.edu.
  • Jinchi Lv
    Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089.
  • Fengzhu Sun
    Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, USA. fsun@usc.edu.