Embedding covariate adjustments in tree-based automated machine learning for biomedical big data analyses.

Journal: BMC bioinformatics
Published Date:

Abstract

BACKGROUND: A typical task in bioinformatics consists of identifying which features are associated with a target outcome of interest and building a predictive model. Automated machine learning (AutoML) systems such as the Tree-based Pipeline Optimization Tool (TPOT) constitute an appealing approach to this end. However, in biomedical data, there are often baseline characteristics of the subjects in a study or batch effects that need to be adjusted for in order to better isolate the effects of the features of interest on the target. Thus, the ability to perform covariate adjustments becomes particularly important for applications of AutoML to biomedical big data analysis.

Authors

  • Elisabetta Manduchi
    University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.
  • Weixuan Fu
    Department of Biostatistics, Epidemiology, and Informatics.
  • Joseph D Romano
    Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, 19104, USA.
  • Stefano Ruberto
    Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, 19104, USA.
  • Jason H Moore
    University of Pennsylvania, Philadelphia, PA, USA.