Systematic feature and architecture evaluation reveals tokenized learned embeddings enhance siRNA efficacy prediction

Journal: bioRxiv

Published Date: Jan 1, 2025

Abstract

Recent advances in machine learning have improved the prediction of siRNA efficacy, with graph neural networks and transformer-based encodings leading the way. However, existing models still face challenges, including potential inaccuracies in thermodynamic feature calculations (such as incorrect strand selection for siRNA-mRNA Gibbs free energy), limited effective utilization of available datasets, and a lack of systematic model refinement. In this study, I systematically evaluated the predictive power of individual features and neural network architectures to identify the most effective configurations. This process led to the development of RN.Ai-Predict, a model built upon a tokenized learned embedding for nucleotide sequences. This work demonstrates that a methodical approach to feature selection and hyperparameter tuning, particularly favoring learned embeddings, can yield a more accurate and reliable model for predicting siRNA efficacy, outperforming more complex architectures in generalizability.

Authors

Rory Coffey

External Resources

View on bioRxiv Access via DOI

Systematic feature and architecture evaluation reveals tokenized learned embeddings enhance siRNA efficacy prediction

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Systematic feature and architecture evaluation reveals tokenized learned embeddings enhance siRNA efficacy prediction

Abstract

Authors

Categories

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals