Detecting false positive sequence homology: a machine learning approach.

Journal: BMC bioinformatics

PMID: 26911862

Abstract

BACKGROUND: Accurate detection of homologous relationships of biological sequences (DNA or amino acid) amongst organisms is an important and often difficult task that is essential to various evolutionary studies, ranging from building phylogenies to predicting functional gene annotations. There are many existing heuristic tools, most commonly based on bidirectional BLAST searches that are used to identify homologous genes and combine them into two fundamentally distinct classes: orthologs and paralogs. Due to only using heuristic filtering based on significance score cutoffs and having no cluster post-processing tools available, these methods can often produce multiple clusters constituting unrelated (non-homologous) sequences. Therefore sequencing data extracted from incomplete genome/transcriptome assemblies originated from low coverage sequencing or produced by de novo processes without a reference genome are susceptible to high false positive rates of homology detection.

Authors

M Stanley Fujimoto

Computer Science Department, Brigham Young University, Provo, Utah, 84602, USA.
Anton Suvorov

Department of Biology, Brigham Young University, Provo, Utah, 84602, USA. antony.suvorov@byu.edu.
Nicholas O Jensen

Department of Biology, Brigham Young University, Provo, Utah, 84602, USA.
Mark J Clement

Computer Science Department, Brigham Young University, Provo, Utah, 84602, USA.
Seth M Bybee

Department of Biology, Brigham Young University, Provo, Utah, 84602, USA.

Keywords

False Positive Reactions Machine Learning Sequence Alignment Sequence Homology

External Resources

View on PubMed Access via DOI PubMed (26911862)

Detecting false positive sequence homology: a machine learning approach.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals