gTranslate: rapid and accurate translation table prediction for prokaryotic genomes
Journal:
bioRxiv
Published Date:
May 28, 2026
Abstract
Background: Bioinformatic tools often require the prediction of protein-coding genes to make inferences about prokaryotic genomes. Typically, the genetic code used for translating genes to proteins must be specified by the user based on the taxonomic classification of a genome assembly or, for some widely used tools, established using a heuristic rule based on gene coding densities. Manual specification is at best inconvenient, but more challenging is that many bioinformatic tools are applied before taxonomic classifications have been established making specifying the translation table impractical. Methods: Here we provide a computationally efficient tool, gTranslate, that uses an ensemble of five machine learning methods to accurately predict translation tables for prokaryotic genomes. The feature vector used by gTranslate takes advantage of differences in gene coding densities when predicting genes under different translation tables along with features that consider the number and ratio of UGA stop codon reassignments to tryptophan or glycine. Results: We demonstrate that gTranslate correctly predicts the translation table of prokaryotic genomes >99.99% of the time (i.e. <1 error per 10,000 genomes) and outperforms a more computationally expensive prediction method and a coding density heuristic used by popular bioinformatic tools. Using gTranslate, we identify a basal lineage of Ca. Stammera capleta that uses the standard bacterial genetic code instead of the UGA stop codon to tryptophan reassignment common to other members of this species. We also identify the first instances of UGA-to-tryptophan reassignment in the Patescibacteriota making this the first bacterial phylum with members capable of using translation tables 4, 11, and 25.