PASS: Protein Annotation Surveillance Site for Protein Annotation Using Homologous Clusters, NLP, and Sequence Similarity Networks.

Journal: Frontiers in bioinformatics
Published Date:

Abstract

Advances in genome sequencing have accelerated the growth of sequenced genomes but at a cost in the quality of genome annotation. At the same time, computational analysis is widely used for protein annotation, but a dearth of experimental verification has contributed to inaccurate annotation as well as to annotation error propagation. Thus, a tool to help life scientists with accurate protein annotation would be useful. In this work we describe a website we have developed, the Protein Annotation Surveillance Site (PASS), which provides such a tool. This website consists of three major components: a database of homologous clusters of more than eight million protein sequences deduced from the representative genomes of bacteria, archaea, eukarya, and viruses, together with sequence information; a machine-learning software tool which periodically queries the UniprotKB database to determine whether protein function has been experimentally verified; and a query-able webpage where the FASTA headers of sequences from the cluster best matching an input sequence are returned. The user can choose from these sequences to create a sequence similarity network to assist in annotation or else use their expert knowledge to choose an annotation from the cluster sequences. Illustrations demonstrating use of this website are presented.

Authors

  • Jin Tao
    School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, United States.
  • Kelly A Brayton
    School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, United States.
  • Shira L Broschat
    School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, United States.

Keywords

No keywords available for this article.