Machine learning model for early diagnosis of breast cancer based on PiRNA expression with CA153.
Journal:
Scientific reports
Published Date:
Aug 20, 2025
Abstract
PIWI-interacting RNAs (piRNAs) have been implicated in the biological processes of various cancers. This study aimed to investigate the diagnostic potential of circulating piRNAs in breast cancer (BC) using machine learning (ML) frameworks. A serum tri-piRNA signature (piR-139966, piR-2572505, piR-2570061) was selected via piRNA sequencing, validated by qPCR, and then analyzed in combination with related clinical factors. Predictive ML models for early diagnosis of BC combining piRNA expression with CA153 were constructed using 10 ML algorithms and evaluated by 8 performance metrics. Serum levels of piR-139966, piR-2572505, and piR-2570061 were significantly upregulated in early-stage BC patients compared to matched healthy controls. This tri-piRNA panel demonstrated enhanced diagnostic precision for BC detection and exhibited complementary value to CA153 measurements, whether used alone or combined. Through systematic ML optimization, we developed a stratified diagnostic model where XGBoost algorithm showed optimal performance in both training and validation cohorts for early-stage BC identification. With XGBoost algorithms applied to piRNA expression along with CA153, we developed and validated a predictive ML model with superior diagnostic accuracy compared to conventional approaches.