PharaCon: a new framework for identifying bacteriophages via conditional representation learning.
Journal:
Bioinformatics (Oxford, England)
PMID:
39992229
Abstract
MOTIVATION: Identifying bacteriophages (phages) within metagenomic sequences is essential for understanding microbial community dynamics. Transformer-based foundation models have been successfully employed to address various biological challenges. However, these models are typically pre-trained with self-supervised tasks that do not consider label variance in the pre-training data. This presents a challenge for phage identification as pre-training on mixed bacterial and phage data may lead to information bias due to the imbalance between bacterial and phage samples.