NetStart 2.0: prediction of eukaryotic translation initiation sites using a protein language model.

Journal: BMC bioinformatics
Published Date:

Abstract

BACKGROUND: Accurate identification of translation initiation sites is essential for the proper translation of mRNA into functional proteins. In eukaryotes, the choice of the translation initiation site is influenced by multiple factors, including its proximity to the 5[Formula: see text] end and the local start codon context. Translation initiation sites mark the transition from non-coding to coding regions. This fact motivates the expectation that the upstream sequence, if translated, would assemble a nonsensical order of amino acids, while the downstream sequence would correspond to the structured beginning of a protein. This distinction suggests potential for predicting translation initiation sites using a protein language model.

Authors

  • Line Sandvad Nielsen
    Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, 2200, Copenhagen, Denmark. line.s.nielsen@bio.ku.dk.
  • Anders Gorm Pedersen
    Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark.
  • Ole Winther
    The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark.
  • Henrik Nielsen
    Department of Bio and Health Informatics, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark.