Machine learning approach identifies prominent codons from different degenerate groups influencing gene expression in bacteria.

Journal: Genes to cells : devoted to molecular & cellular mechanisms
Published Date:

Abstract

Unequal usage of synonymous codons is known as codon usage bias (CUB), which is generally different between the high-expression genes (HEG) and low-expression genes (LEG) in organisms is not yet adequately reported across different bacteria. In this study, a machine learning-based approach was implemented initially to find out codons that are significantly different between the HEG and LEG in Escherichia coli. It identified Cys codons such as UGU and UGC, Lys codons such as AAA and AAG that were least influenced by gene expression. Codons such as UCU (Ser), CUG (Leu), GGG (Gly), CGG (Arg) etc. were identified to be influenced maximum by the gene expression. The study was extended to analyze codon usage in 683 other bacterial species. Cys (UGU/UGC) and Ser (AGU/AGC) codons were identified being the least different between the two groups of genes across these bacterial species. Codons such as CGA, CUG, GGG, GCC, ACC, AUA, and AUC were identified to be influenced by the gene expression across majority of these species. This study supports the role of CUB on gene expression across bacteria and demonstrates a commonality among bacteria regarding behavior of certain codons with regard to gene expression.

Authors

  • Piyali Sen
    Moorfields Eye Hospital, London, United Kingdom.
  • Annushree Kurmi
    Department of Computer Science and Engineering, Tezpur University, Tezpur, Assam, India.
  • Suvendra Kumar Ray
    Molecular Biology and Biotechnology, Tezpur University, Tezpur, Assam, India.
  • Siddhartha Sankar Satapathy
    Department of Computer Science and Engineering, Tezpur University, Tezpur, Assam, India.