A supervised machine learning approach with feature selection for sex-specific biomarker prediction.
Journal:
NPJ systems biology and applications
Published Date:
Jul 1, 2025
Abstract
Biomarkers are crucial in aiding in disease diagnosis, prognosis, and treatment selection. Machine learning (ML) has emerged as an effective tool for identifying novel biomarkers and enhancing predictive modelling. However, sex-based bias in ML algorithms remains a concern. This study developed a supervised ML model to predict nine common clinical biomarkers, including triglycerides, BMI, waist circumference, systolic blood pressure, blood glucose, uric acid, urinary albumin-to-creatinine ratio, high-density lipoproteins, and albuminuria. The model's predictions were within 5-10% error of actual values. For predictions within 10% error, the top performing models were waist circumference, albuminuria, BMI, blood glucose and systolic blood pressure, with males scoring higher than females, followed by the combined data set containing sex as an input feature and the combined data without sex as an input feature performing the poorest. This study highlighted the benefits of stratifying data according to sex for ML based models.