Machine Learning Methods for Predicting Human-Adaptive Influenza A Viruses Based on Viral Nucleotide Compositions.

Journal: Molecular biology and evolution
Published Date:

Abstract

Each influenza pandemic was caused at least partly by avian- and/or swine-origin influenza A viruses (IAVs). The timing of and the potential IAVs involved in the next pandemic are currently unpredictable. We aim to build machine learning (ML) models to predict human-adaptive IAV nucleotide composition. A total of 217,549 IAV full-length coding sequences of the PB2 (polymerase basic protein-2), PB1, PA (polymerase acidic protein), HA (hemagglutinin), NP (nucleoprotein), and NA (neuraminidase) segments were decomposed for their codon position-based mononucleotides (12 nts) and dinucleotides (48 dnts). A total of 68,742 human sequences and 68,739 avian sequences (1:1) were resampled to characterize the human adaptation-associated (d)nts with principal component analysis (PCA) and other ML models. Then, the human adaptation of IAV sequences was predicted based on the characterized (d)nts. Respectively, 9, 12, 11, 13, 10 and 9 human-adaptive (d)nts were optimized for the six segments. PCA and hierarchical clustering analysis revealed the linear separability of the optimized (d)nts between the human-adaptive and avian-adaptive sets. The results of the confusion matrix and the area under the receiver operating characteristic curve indicated a high performance of the ML models to predict human adaptation of IAVs. Our model performed well in predicting the human adaptation of the swine/avian IAVs before and after the 2009 H1N1 pandemic. In conclusion, we identified the human adaptation-associated genomic composition of IAV segments. ML models for IAV human adaptation prediction using large IAV genomic data sets can facilitate the identification of key viral factors that affect virus transmission/pathogenicity. Most importantly, it allows the prediction of pandemic influenza.

Authors

  • Jing Li
    Department of Neurosurgery, Tianjin Medical University General Hospital, Tianjin, China.
  • Sen Zhang
    Department of Gastrointestinal Surgery, Hernia Center, West China Hospital, Sichuan University, Chengdu, China.
  • Bo Li
    Electric Power Research Institute, Yunnan Power Grid Co., Ltd., Kunming, Yunnan, China.
  • Yi Hu
    School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China.
  • Xiao-Ping Kang
    Department of Virology, State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, China.
  • Xiao-Yan Wu
    Department of Virology, State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, China.
  • Meng-Ting Huang
    Department of Virology, State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, China.
  • Yu-Chang Li
    Department of Virology, State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, China.
  • Zhong-Peng Zhao
    Department of Infection and Immunology, State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, China.
  • Cheng-Feng Qin
    Department of Virology, State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, China.
  • Tao Jiang
    Department of Respiratory and Critical Care Medicine, Center for Respiratory Medicine, the Fourth Affiliated Hospital of School of Medicine, and International School of Medicine, International Institutes of Medicine, Zhejiang University, Yiwu, China.