rpcFold: Residual Parallel Convolutional neural network to decipher RNA folding from RNA sequence
Journal:
bioRxiv
Published Date:
Jan 1, 2025
Abstract
Precise secondary structure information offers deeper insights into the functionality of many RNA molecules. Earlier approaches, which primarily relied on the free-energy minimization, proved inadequate, as RNA often adopts complex folds, especially with increasing length. We present rpcFold, a residual parallel convolutional neural network-based model for RNA secondary structure prediction. From the nucleotide sequences, we compute a base-pairing possibility score using a locally weighted Gaussian function. It captures the intricate canonical and noncanonical pairing patterns, which enhances the modelling of short- and long-range dependencies. The features are mapped into an image that is the input to rpcFold. Our sliding window mechanism accommodates sequences of arbitrary length, and hence explicitly addresses the prediction of pseudoknots, which are often overlooked in prior works. The performance of rpcFold is demonstrated on nested and on non-nested (pseudoknot) base-pairs. While tested on within-family and cross-family benchmark datasets, rpcFold shows improved performance over existing state-of-the-art methods. Additionally, for long nucleotide sequences with complex pseudoknots, rpcFold achieves 71.1% F1-score in jointly predicting nested and non-nested pairs. Performance on unseen RNA families further confirms the robustness and adaptability of our approach. The prediction accuracy of rpcFold, particularly in long-range RNAs and pseudoknots, improves our understanding of RNA functions.