An effective deep learning-based approach for splice site identification in gene expression.

Journal: Science progress
PMID:

Abstract

A crucial stage in eukaryote gene expression involves mRNA splicing by a protein assembly known as the spliceosome. This step significantly contributes to generating and properly operating the ultimate gene product. Since non-coding introns disrupt eukaryotic genes, splicing entails the elimination of introns and joining exons to create a functional mRNA molecule. Nevertheless, accurately finding splice sequence sites using various molecular biology techniques and other biological approaches is complex and time-consuming. This paper presents a precise and reliable computer-aided diagnosis (CAD) technique for the rapid and correct identification of splice site sequences. The proposed deep learning-based framework uses long short-term memory (LSTM) to extract distinct patterns from RNA sequences, enabling rapid and accurate point mutation sequence mapping. The proposed network employs one-hot encodings to find sequential patterns that effectively identify splicing sites. A thorough ablation study of traditional machine learning, one-dimensional convolutional neural networks (1D-CNNs), and recurrent neural networks (RNNs) models was conducted. The proposed LSTM network outperformed existing state-of-the-art approaches, improving accuracy by 3% and 2% for the acceptor and donor sites datasets.

Authors

  • Mohsin Ali
    Institute for Next Generation Healthcare, Mount Sinai Health System, New York, New York; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York.
  • Dilawar Shah
    Department of Computer Science, Bacha Khan University, Charsadda, KP, Pakistan.
  • Shahid Qazi
    Department of Computer Science, Bacha Khan University, Charsadda, KP, Pakistan.
  • Izaz Ahmad Khan
    Department of Computer Science, Bacha Khan University, Charsadda (BKUC), Charsadda 24420, Pakistan.
  • Mohammad Abrar
    Faculty of Computer Studies, Arab Open University, Muscat, Oman.
  • Sana Zahir
    Institute of Computer Sciences and Information Technology, The University of Agriculture Peshawar, Peshawar, KP, Pakistan.