Scaling tree-based automated machine learning to biomedical big data with a feature set selector.
Journal:
Bioinformatics (Oxford, England)
Published Date:
Jan 1, 2020
Abstract
MOTIVATION: Automated machine learning (AutoML) systems are helpful data science assistants designed to scan data for novel features, select appropriate supervised learning models and optimize their parameters. For this purpose, Tree-based Pipeline Optimization Tool (TPOT) was developed using strongly typed genetic programing (GP) to recommend an optimized analysis pipeline for the data scientist's prediction problem. However, like other AutoML systems, TPOT may reach computational resource limits when working on big data such as whole-genome expression data.