Predicting viral host codon fitness and path shifting through tree-based learning on codon usage biases and genomic characteristics.

Journal: Scientific reports
PMID:

Abstract

Viral codon fitness (VCF) of the host and the VCF shifting has seldom been studied under quantitative measurements, although they could be concepts vital to understand pathogen epidemiology. This study demonstrates that the relative synonymous codon usage (RSCU) of virus genomes together with other genomic properties are predictive of virus host codon fitness through tree-based machine learning. Statistical analysis on the RSCU data matrix also revealed that the wobble position of the virus codons is critically important for the host codon fitness distinction. As the trained models can well characterise the host codon fitness of the viruses, the frequency and other details stored at the leaf nodes of these models can be reliably translated into human virus codon fitness score (HVCF score) as a readout of codon fitness of any virus infecting human. Specifically, we evaluated and compared HVCF of virus genome sequences from human sources and others and evaluated HVCF of SARS-CoV-2 genome sequences from NCBI virus database, where we found no obvious shifting trend in host codon fitness towards human-non-infectious. We also developed a bioinformatics tool to simulate codon-based virus fitness shifting using codon compositions of the viruses, and we found that Tylonycteris bat coronavirus HKU4 related viruses may have close relationship with SARS-CoV-2 in terms of human codon fitness. The finding of abundant synonymous mutations in the predicted codon fitness shifting path also provides new insights for evolution research and virus monitoring in environmental surveillance.

Authors

  • Shuquan Su
    Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, China.
  • Zhongran Ni
    Cancer Data Science (CDS), Children's Medical Research Institute (CMRI), ProCan, Westmead, Australia.
  • Tian Lan
  • Pengyao Ping
    School of Computer Science (SoCS), Faculty of Engineering and Information Technology (FEIT), University of Technology Sydney (UTS), Sydney, Australia.
  • Jinling Tang
    Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, China.
  • Zuguo Yu
    Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Hunan, 411105, China. yuzuguo@aliyun.com.
  • Gyorgy Hutvagner
    School of Biomedical Engineering, Faculty of Engineering and Information Technology (FEIT), University of Technology Sydney (UTS), Sydney, Australia.
  • Jinyan Li