Analysis of heterogeneous genomic samples using image normalization and machine learning.

Journal: BMC genomics
Published Date:

Abstract

BACKGROUND: Analysis of heterogeneous populations such as viral quasispecies is one of the most challenging bioinformatics problems. Although machine learning models are becoming to be widely employed for analysis of sequence data from such populations, their straightforward application is impeded by multiple challenges associated with technological limitations and biases, difficulty of selection of relevant features and need to compare genomic datasets of different sizes and structures.

Authors

  • Sunitha Basodi
    Department of Computer Science, Georgia State University, 25 Park Place NE, Atlanta, GA, 30303, USA. sbasodi1@student.gsu.edu.
  • Pelin Icer Baykal
    Department of Computer Science, Georgia State University, 25 Park Place NE, Atlanta, GA, 30303, USA.
  • Alex Zelikovsky
    Department of Computer Science, Georgia State University, 25 Park Place NE, Atlanta, GA, 30303, USA.
  • Pavel Skums
    Georgia State University, Atlanta, Georgia, United States of America.
  • Yi Pan
    Department of Neurosis and Psychosomatic Diseases, Huzhou Third Municipal Hospital, The Affiliated Hospital of Huzhou University, Huzhou, Zhejiang, China.