Scaling tree-based automated machine learning to biomedical big data with a feature set selector.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: Automated machine learning (AutoML) systems are helpful data science assistants designed to scan data for novel features, select appropriate supervised learning models and optimize their parameters. For this purpose, Tree-based Pipeline Optimization Tool (TPOT) was developed using strongly typed genetic programing (GP) to recommend an optimized analysis pipeline for the data scientist's prediction problem. However, like other AutoML systems, TPOT may reach computational resource limits when working on big data such as whole-genome expression data.

Authors

  • Trang T Le
    Department of Biostatistics, Epidemiology, and Informatics.
  • Weixuan Fu
    Department of Biostatistics, Epidemiology, and Informatics.
  • Jason H Moore
    University of Pennsylvania, Philadelphia, PA, USA.