Negative dataset selection impacts machine learning-based predictors for multiple bacterial species promoters.

Journal: Bioinformatics (Oxford, England)

PMID: 40152247

Abstract

MOTIVATION: Advances in bacterial promoter predictors based on machine learning have greatly improved identification metrics. However, existing models overlooked the impact of negative datasets, previously identified in GC-content discrepancies between positive and negative datasets in single-species models. This study aims to investigate whether multiple-species models for promoter classification are inherently biased due to the selection criteria of negative datasets. We further explore whether the generation of synthetic random sequences (SRS) that mimic GC-content distribution of promoters can partly reduce this bias.

Authors

Marcelo González

Departamento de Electrónica, Universidad Técnica Federico Santa María, Avenida España 1680, Valparaíso 2390123, Chile.
Roberto E Durán

Laboratorio de Microbiología Molecular y Biotecnología Ambiental, Department of Chemistry & Center of Biotechnology Daniel Alkalay Lowitt, Universidad Técnica Federico Santa María, Avenida España 1680, Valparaíso 2390123, Chile.
Michael Seeger

Laboratorio de Microbiología Molecular y Biotecnología Ambiental, Department of Chemistry & Center of Biotechnology Daniel Alkalay Lowitt, Universidad Técnica Federico Santa María, Avenida España 1680, Valparaíso 2390123, Chile.
Mauricio Araya

Departamento de Electrónica, Universidad Técnica Federico Santa María, Avenida España 1680, Valparaíso 2390123, Chile.
Nicolás Jara

Departamento de Electrónica, Universidad Técnica Federico Santa María, Avenida España 1680, Valparaíso 2390123, Chile.

Keywords

Bacteria Base Composition Computational Biology Genome, Bacterial Machine Learning Promoter Regions, Genetic

External Resources

View on PubMed Access via DOI PubMed (40152247)

Negative dataset selection impacts machine learning-based predictors for multiple bacterial species promoters.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals