UriPred: Machine learning prediction of urinary proteins and identification of biomarkers for liver cancer.

Journal: Computational biology and chemistry
Published Date:

Abstract

Urinary proteins are promising non-invasive biomarkers, but their low abundance and wide dynamic range make detection challenging. This study presents UriPred, a computational tool that integrates machine learning (ML), BLAST, and Motif-EmeRging and Classes-Identification (MERCI) to predict urinary proteins and facilitate the identification of liver cancer (LC) biomarkers. A dataset of 10588 urinary and non-urinary proteins was curated, from which two feature types were generated: 10074 compositional and 20 evolutionary features. Seven feature selection methods were applied to compositional features, and 11 ML algorithms were trained on different feature sets. Evolutionary features achieved the highest training performance (AUC 0.79, accuracy 71.99 %), whereas amino acid composition (AAC) with 20 features achieved identical validation AUC (0.74) and comparable accuracy while being computationally less expensive and consistently selected. The ML-AAC model was therefore chosen as the final model. This optimal model was integrated with BLAST and MERCI to create UriPred, which reduced false positives from 34.59 % (ML) to 3.12 % (hybrid) on the validation dataset and from 5.8 % (ML) to zero (hybrid) on an external dataset. Using UriPred, 53 LC differentially expressed protein-coding genes were predicted as urinary proteins. Protein-protein interaction analysis, AUROC evaluation (AUC > 0.80), survival analysis, and cross-verification of urine detectability with the Human Protein Atlas and Human Urine PeptideAtlas databases identified five proteins (KIF23, COL15A1, CTHRC1, MMP9, and SPP1) as potential LC biomarkers. UriPred efficiently predicts urinary proteins using AAC features and enables biomarker discovery for LC. The tool is publicly available at https://github.com/Dahrii-Paul/UriPred.

Authors

Keywords

No keywords available for this article.