SyntheVAEiser: augmenting traditional machine learning methods with VAE-based gene expression sample generation for improved cancer subtype predictions.

Journal: Genome biology
Published Date:

Abstract

The accuracy of machine learning methods is often limited by the amount of training data that is available. We proposed to improve machine learning training regimes by augmenting datasets with synthetically generated samples. We present a method for synthesizing gene expression samples and test the system's capabilities for improving the accuracy of categorical prediction of cancer subtypes. We developed SyntheVAEiser, a variational autoencoder based tool that was trained and tested on over 8000 cancer samples. We have shown that this technique can be used to augment machine learning tasks and increase performance of recognition of underrepresented cohorts.

Authors

  • Brian Karlberg
    Biomedical Engineering, Oregon Health and Science University, 3181 S.W. Sam Jackson Park Road, Portland, OR, 97239-3098, USA.
  • Raphael Kirchgaessner
    Biomedical Engineering, Oregon Health and Science University, 3181 S.W. Sam Jackson Park Road, Portland, OR, 97239-3098, USA.
  • Jordan Lee
    Biomedical Engineering, Oregon Health and Science University, 3181 S.W. Sam Jackson Park Road, Portland, OR, 97239-3098, USA.
  • Matthew Peterkort
    Biomedical Engineering, Oregon Health and Science University, 3181 S.W. Sam Jackson Park Road, Portland, OR, 97239-3098, USA.
  • Liam Beckman
    Oregon Health and Science University, Portland, OR 97239, USA.
  • Jeremy Goecks
    Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA. Electronic address: goecksj@ohsu.edu.
  • Kyle Ellrott
    Biomedical Engineering, Oregon Health and Science University, 3181 S.W. Sam Jackson Park Road, Portland, OR, 97239-3098, USA. ellrott@ohsu.edu.