Leveraging transformers for semi-supervised pathogenicity prediction with soft labels.

Journal: Journal of integrative bioinformatics

Published Date: Jun 23, 2025

Abstract

The rapid advancement of Next-Generation Sequencing (NGS) technologies has revolutionized the field of genomics, producing large volumes of data that necessitate sophisticated analytical techniques. This paper introduces a Deep Learning model designed to predict the pathogenicity of genetic variants, a vital component in advancing personalized medicine. The model is trained on a dataset derived from the analysis of NGS outputs, containing a combination of well-defined and ambiguous genetic variants. By employing a semi-supervised learning approach, the model efficiently utilizes both confidently labeled and less certain data. At the core of the methodology is the Feature Tokenizer Transformer architecture, which processes both numerical and categorical genomic information. The preprocessing pipeline includes key steps such as data imputation, scaling, and encoding to ensure high data quality. The results highlight the model's impressive accuracy, particularly in detecting confidently labeled variants, while also addressing the impact of its predictions on less certain (soft-labeled) data.

Authors

Pablo Enrique Guillem

AIR Institute, IoT Digital Innovation Hub, Salamanca, Spain.
Marco Zurdo-Tabernero

BISITE Research Group, University of Salamanca, Salamanca, Spain.
Noelia Egido Iglesias

BISITE Research Group, 16779 University of Salamanca , Salamanca, Spain.
Ángel Canal-Alonso

BISITE Research Group, University of Salamanca, Salamanca, Spain.
Liliana Durón Figueroa

BISITE Research Group, 16779 University of Salamanca , Salamanca, Spain.
Guillermo Hernández

Grupo de Investigación BISITE, Universidad de Salamanca, 37008 Salamanca, Spain.
Angélica González-Arrieta

Grupo de Investigación BISITE, Universidad de Salamanca, 37008 Salamanca, Spain.
Fernando de la Prieta

BISITE Research Group, University of Salamanca, Salamanca, Spain.

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (40538169)

Leveraging transformers for semi-supervised pathogenicity prediction with soft labels.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Leveraging transformers for semi-supervised pathogenicity prediction with soft labels.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals