Complementary feature selection from alternative splicing events and gene expression for phenotype prediction.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: A central task of bioinformatics is to develop sensitive and specific means of providing medical prognoses from biomarker patterns. Common methods to predict phenotypes in RNA-Seq datasets utilize machine learning algorithms trained via gene expression. Isoforms, however, generated from alternative splicing, may provide a novel and complementary set of transcripts for phenotype prediction. In contrast to gene expression, the number of isoforms increases significantly due to numerous alternative splicing patterns, resulting in a prioritization problem for many machine learning algorithms. This study identifies the empirically optimal methods of transcript quantification, feature engineering and filtering steps using phenotype prediction accuracy as a metric. At the same time, the complementary nature of gene and isoform data is analyzed and the feasibility of identifying isoforms as biomarker candidates is examined.

Authors

  • Charles J Labuzzetta
    Department of Mathematics, Iowa State University, Ames, IA 50011, USA.
  • Margaret L Antonio
    Department of Biology, Boston College, Chestnut Hill, MA 02467, USA.
  • Patricia M Watson
    Department of Pathology and Laboratory Medicine, Medical University of South Carolina, Charleston, NC 29425, USA.
  • Robert C Wilson
    Department of Pathology and Laboratory Medicine, Medical University of South Carolina, Charleston, NC 29425, USA.
  • Lauren A Laboissonniere
    Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA.
  • Jeffrey M Trimarchi
    Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA.
  • Baris Genc
    Ken and Ruth Davee Department of Neurology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA.
  • P Hande Ozdinler
    Ken and Ruth Davee Department of Neurology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA.
  • Dennis K Watson
    Department of Pathology and Laboratory Medicine, Medical University of South Carolina, Charleston, NC 29425, USA.
  • Paul E Anderson
    Department of Computer Science, College of Charleston, Charleston, SC 29424, USA.