RNA-seq assistant: machine learning based methods to identify more transcriptional regulated genes.

Journal: BMC genomics
Published Date:

Abstract

BACKGROUND: Although different quality controls have been applied at different stages of the sample preparation and data analysis to ensure both reproducibility and reliability of RNA-seq results, there are still limitations and bias on the detectability for certain differentially expressed genes (DEGs). Whether the transcriptional dynamics of a gene can be captured accurately depends on experimental design/operation and the following data analysis processes. The workflow of subsequent data processing, such as reads alignment, transcript quantification, normalization, and statistical methods for ultimate identification of DEGs can influence the accuracy and sensitivity of DEGs analysis, producing a certain number of false-positivity or false-negativity. Machine learning (ML) is a multidisciplinary field that employs computer science, artificial intelligence, computational statistics and information theory to construct algorithms that can learn from existing data sets and to make predictions on new data set. ML-based differential network analysis has been applied to predict stress-responsive genes through learning the patterns of 32 expression characteristics of known stress-related genes. In addition, the epigenetic regulation plays critical roles in gene expression, therefore, DNA and histone methylation data has been shown to be powerful for ML-based model for prediction of gene expression in many systems, including lung cancer cells. Therefore, it is promising that ML-based methods could help to identify the DEGs that are not identified by traditional RNA-seq method.

Authors

  • Likai Wang
    Institute for Cellular and Molecular Biology, The University of Texas at Austin, 2506 Speedway, NMS 5.324, Austin, TX, 78712, USA.
  • Yanpeng Xi
    Institute for Cellular and Molecular Biology, The University of Texas at Austin, 2506 Speedway, NMS 5.324, Austin, TX, 78712, USA.
  • Sibum Sung
    Institute for Cellular and Molecular Biology, The University of Texas at Austin, 2506 Speedway, NMS 5.324, Austin, TX, 78712, USA.
  • Hong Qiao
    State Key Lab of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of SciencesBeijing, China; Chinese Academy of Sciences Center for Excellence in Brain Science and Intelligence TechnologyShanghai, China; University of Chinese Academy of SciencesBeijing, China.