Uncertainty Quantification and Temperature Scaling Calibration for Protein-RNA Binding Site Prediction.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

The black-box nature of deep learning has increasingly drawn attention to the reliability and uncertainty of predictive models. Currently, several uncertainty quantification (UQ) methods have been proposed and successfully applied in the fields of molecules and proteins, effectively improving model prediction quality and interpretability. Protein-RNA binding represents a fundamental aspect of protein research. Accurate prediction of binding sites and ensuring the reliability of such predictions are crucial for various scientific endeavors. However, many of the existing computational methods have a single feature extraction and lack of UQ. To address these, we propose MGCA (multiscale graph convolutional networks, convolutional neural networks and attention) to better capture local and global information and achieve competitive results in predicting protein-RNA binding sites. Moreover, we launch a UQ study based on MGCA and five prevalent models to verify the robustness of the results. Specifically, we introduce the Expected Calibration Error (ECE) to assess the uncertainty of the models. Additionally, a novel split-bins screening method is proposed based on the ECE, aiming to investigate the practical impact of reducing uncertainty on the models. Finally, temperature scaling (TS) is used to calibrate model uncertainty without changing performance. Results show that the split-bins screening method reduces false positives (FP), and TS significantly decreases the model ECE. The split-bins screening method combined with TS can further reduce FP and improve precision. Our findings demonstrate that TS effectively reduces uncertainty in protein-RNA binding site prediction, and minimizing model uncertainty enhances prediction quality. The data and code can be available at https://github.com/trustcm/UQ-TS-Split-bins-RBP.

Authors

  • Ximin Zeng
    Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang 330031, China.
  • Hongmei Wang
  • Long Zhao
    Department of Respiratory Medicine and Intensive Care Unit.Peking University People's Hospital, Beijing 100044, China.
  • Yue Cheng
  • Danping Zhou
    Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang 330031, China.
  • Shaoping Shi
    Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang 330031, China. Electronic address: shishaoping@ncu.edu.cn.