Effect of normalization methods on the performance of supervised learning algorithms applied to HTSeq-FPKM-UQ data sets: 7SK RNA expression as a predictor of survival in patients with colon adenocarcinoma.
Journal:
Briefings in bioinformatics
Published Date:
May 21, 2019
Abstract
MOTIVATION: One of the main challenges in machine learning (ML) is choosing an appropriate normalization method. Here, we examine the effect of various normalization methods on analyzing FPKM upper quartile (FPKM-UQ) RNA sequencing data sets. We collect the HTSeq-FPKM-UQ files of patients with colon adenocarcinoma from TCGA-COAD project. We compare three most common normalization methods: scaling, standardizing using z-score and vector normalization by visualizing the normalized data set and evaluating the performance of 12 supervised learning algorithms on the normalized data set. Additionally, for each of these normalization methods, we use two different normalization strategies: normalizing samples (files) or normalizing features (genes).