Uncertainty Quantification and Temperature Scaling Calibration for Protein-RNA Binding Site Prediction.
Journal:
Journal of chemical information and modeling
Published Date:
Jun 2, 2025
Abstract
The black-box nature of deep learning has increasingly drawn attention to the reliability and uncertainty of predictive models. Currently, several uncertainty quantification (UQ) methods have been proposed and successfully applied in the fields of molecules and proteins, effectively improving model prediction quality and interpretability. Protein-RNA binding represents a fundamental aspect of protein research. Accurate prediction of binding sites and ensuring the reliability of such predictions are crucial for various scientific endeavors. However, many of the existing computational methods have a single feature extraction and lack of UQ. To address these, we propose MGCA (multiscale graph convolutional networks, convolutional neural networks and attention) to better capture local and global information and achieve competitive results in predicting protein-RNA binding sites. Moreover, we launch a UQ study based on MGCA and five prevalent models to verify the robustness of the results. Specifically, we introduce the Expected Calibration Error (ECE) to assess the uncertainty of the models. Additionally, a novel split-bins screening method is proposed based on the ECE, aiming to investigate the practical impact of reducing uncertainty on the models. Finally, temperature scaling (TS) is used to calibrate model uncertainty without changing performance. Results show that the split-bins screening method reduces false positives (FP), and TS significantly decreases the model ECE. The split-bins screening method combined with TS can further reduce FP and improve precision. Our findings demonstrate that TS effectively reduces uncertainty in protein-RNA binding site prediction, and minimizing model uncertainty enhances prediction quality. The data and code can be available at https://github.com/trustcm/UQ-TS-Split-bins-RBP.