Supervised model based polycystic ovarian syndrome detection in relation to vitamin d deficiency by exploring different feature selection techniques.
Journal:
Scientific reports
Published Date:
Aug 26, 2025
Abstract
Due to urbanization and modern lifestyle, most of women in today's world are prone to Polycystic Ovarian Syndrome (PCOS), which is a hormonal disorder. Though the symptoms shown by this disease are often uncared, it seriously affects the reproductive health of women. Early detection of PCOS helps in managing several other attributes that are closely related to it. This article aims to study the impact of Vitamin D3 in PCOS and non-PCOS individuals. The goal is attained by building a tailored dataset with 1368 records and 43 attributes. Initially, the acquired dataset is pre-processed by handling missed values, outlier detection and data balancing by employing Probabilistic Principal Component Analysis (PPCA), Interquartile Range (IQR), Z-score standardization and SMOTE respectively. The significant features are selected by exploring different approaches such as filter based (Chi-Square, ANOVA), wrapper based (Electric Eel Foraging Optimization Algorithm) and embedded methods (LASSO, XGBoost). The selected features are utilized to train classifiers such as Random Forest (RF), k-Nearest Neighbour (k-NN), Decision Tree (DT) and Support Vector Machine (SVM). The experimental results show that the performance of EEFOA with RF prove the best accuracy rates of 98.8% with a F-measure of 98.19%. Explainable Artificial Intelligence (XAI) techniques such as SHAP and LIME are then employed to showcase the feature importance. It is observed that over 40% of PCOS patients are affected by deficiency and insufficiency of vitamin D3.