Machine Learning from Omics Data.
Journal:
Methods in molecular biology (Clifton, N.J.)
Published Date:
Jan 1, 2022
Abstract
Machine learning (ML) already accelerates discoveries in many scientific fields and is the driver behind several new products. Recently, growing sample sizes enabled the use of ML approaches in larger omics studies. This work provides a guide through a typical analysis of an omics dataset using ML. As an example, this chapter demonstrates how to build a model predicting Drug-Induced Liver Injury based on transcriptomics data contained in the LINCS L1000 dataset. Each section covers best practices and pitfalls starting from data exploration and model training including hyperparameter search to validation and analysis of the final model. The code to reproduce the results is available at https://github.com/Evotec-Bioinformatics/ml-from-omics .