Predicting Influenza A Tropism with End-to-End Learning of Deep Networks.

Journal: Health security
PMID:

Abstract

The type of host that a virus can infect, referred to as host specificity or tropism, influences infectivity and thus is important for disease diagnosis, epidemic response, and prevention. Advances in DNA sequencing technology have enabled rapid metagenomic analyses of viruses, but the prediction of virus phenotype from genome sequences is an active area of research. As such, automatic prediction of host tropism from analysis of genomic information is of considerable utility. Previous research has applied machine learning methods to accomplish this task, although deep learning (particularly deep convolutional neural network, CNN) techniques have not yet been applied. These techniques have the ability to learn how to recognize critical hierarchical structures within the genome in a data-driven manner. We designed deep CNN models to identify host tropism for human and avian influenza A viruses based on protein sequences and performed a detailed analysis of the results. Our findings show that deep CNN techniques work as well as existing approaches (with 99% mean accuracy on the binary prediction task) while performing end-to-end learning of the prediction model (without the need to specify handcrafted features). The findings also show that these models, combined with standard principal component analysis, can be used to quantify and visualize viral strain similarity.

Authors

  • Dan Scarafoni
    Dan Scarafoni, MS, is a graduate student, Lab for Computational Behavior Analysis, Georgia Institute of Technology, Atlanta, GA.
  • Brian A Telfer
    MIT Lincoln Laboratory, 244 Wood St, Lexington, MA, 02420, USA.
  • Darrell O Ricke
    Darrell O. Ricke, PhD, is on the Technical Staff, Biological and Chemical Technologies.
  • Jason R Thornton
    Jason R. Thornton, PhD, is Associate Group Leader, Informatics and Decision Support Group.
  • James Comolli
    James Comolli, PhD, is on the Technical Staff, Biological and Chemical Technologies Group; all at the MIT Lincoln Laboratory, Lexington MA.