DeLIVR: a deep learning approach to IV regression for testing nonlinear causal effects in transcriptome-wide association studies.

Journal: Biostatistics (Oxford, England)
PMID:

Abstract

Transcriptome-wide association studies (TWAS) have been increasingly applied to identify (putative) causal genes for complex traits and diseases. TWAS can be regarded as a two-sample two-stage least squares method for instrumental variable (IV) regression for causal inference. The standard TWAS (called TWAS-L) only considers a linear relationship between a gene's expression and a trait in stage 2, which may lose statistical power when not true. Recently, an extension of TWAS (called TWAS-LQ) considers both the linear and quadratic effects of a gene on a trait, which however is not flexible enough due to its parametric nature and may be low powered for nonquadratic nonlinear effects. On the other hand, a deep learning (DL) approach, called DeepIV, has been proposed to nonparametrically model a nonlinear effect in IV regression. However, it is both slow and unstable due to the ill-posed inverse problem of solving an integral equation with Monte Carlo approximations. Furthermore, in the original DeepIV approach, statistical inference, that is, hypothesis testing, was not studied. Here, we propose a novel DL approach, called DeLIVR, to overcome the major drawbacks of DeepIV, by estimating a related but different target function and including a hypothesis testing framework. We show through simulations that DeLIVR was both faster and more stable than DeepIV. We applied both parametric and DL approaches to the GTEx and UK Biobank data, showcasing that DeLIVR detected additional 8 and 7 genes nonlinearly associated with high-density lipoprotein (HDL) cholesterol and low-density lipoprotein (LDL) cholesterol, respectively, all of which would be missed by TWAS-L, TWAS-LQ, and DeepIV; these genes include BUD13 associated with HDL, SLC44A2 and GMIP with LDL, all supported by previous studies.

Authors

  • Ruoyu He
    Division of Biostatistics, School of Public Health, University of Minnesota, A460 Mayo Building, 420 Delaware Street SE, Minneapolis, MN 55455 and School of Statistics, University of Minnesota, 313 Ford Hall, 224 Church St SE, Minneapolis, MN 55455.
  • Mingyang Liu
    Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA.
  • Zhaotong Lin
    Division of Biostatistics, School of Public Health, University of Minnesota, A460 Mayo Building, 420 Delaware Street SE, Minneapolis, MN 55455 and School of Statistics, University of Minnesota, 313 Ford Hall, 224 Church St SE, Minneapolis, MN 55455.
  • Zhong Zhuang
    Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455, USA.
  • Xiaotong Shen
    School of Statistics, University of Minnesota, Minneapolis, Minnesota.
  • Wei Pan