PRP: pathogenic risk prediction for rare nonsynonymous single nucleotide variants.
Journal:
Human genetics
Published Date:
May 29, 2025
Abstract
Reliable prediction of pathogenic variants plays a crucial role in personalized medicine, which aims to provide accurate diagnosis and individualized treatment using genomic medicine. This study introduces PRP, a pathogenic risk prediction for rare nonsynonymous single nucleotide variants (nsSNVs), including missense, start_lost, stop_gained, and stop_lost variants. PRP was designed to provide robust performance and interpretable predictions using thirty-four features across four categories: frequency, conservation score, substitution metrics, and gene intolerance. Five machine-learning (ML) algorithms were compared to select the optimal model. Hyperparameter optimization was conducted using Optuna, and feature importance was analyzed using Shapley Additive exPlanations (SHAP). PRP used ClinVar data for training and evaluated performance using three independent test datasets and compared it with that of twenty other prediction tools. PRP consistently outperformed state-of-the-art tools across all eight performance metrics: AUC, AUPRC, Accuracy, F1-score, MCC, Precision, Recall, and Specificity. In addition to achieving high sensitivity and high specificity without overestimating the number of pathogenic variants, PRP demonstrates robustness in predicting rare variants. The datasets and codes used for training and testing PRP, along with pre-computed scores, are available at https://github.com/DNAvigation/PRP .