A diagnostic model for polycystic ovary syndrome based on machine learning.

Journal: Scientific reports
PMID:

Abstract

Diagnosis of polycystic ovary syndrome remains a challenge. In this study, we propose constructing a diagnostic model of polycystic ovary syndrome by combining anti-Müllerian hormone with steroid hormones and oestrogens, with the aim of providing more bases and auxiliary means for the diagnosis of this disease. 1. Eighty-four samples from patients who were diagnosed with polycystic ovary syndrome at the First Affiliated Hospital of Zhejiang Chinese Medical University from May 2023 to November 2023 were collected as the case group, and 75 samples from the healthy population of the Health Screening Centre of the First Affiliated Hospital of Zhejiang Chinese Medical University during the same period were collected as the control group. 2. General information (including age, BMI, family history, medication history, etc.) and sex hormone data (including luteinising hormone, follicle stimulating hormone, prolactin, estradiol, testosterone, etc.) were collected from all study subjects. AMH and steroid hormone tests were performed on serum collected from all study subjects. 3. The data of 10 case groups and 10 control groups were randomly selected as validation set data, and the rest of the data were included in the model construction. The acquired data were screened for variables, a classification model based on a machine learning algorithm was constructed, and the constructed model was evaluated and validated for diagnostic efficacy. Ultimately, a total of 8 variables were screened and included in the subsequent model construction, namely LH, LH/FSH, E2, PRL, T, AMH, AD, and COR, with AMH having the highest diagnostic potential among all the variables included in the model. A total of five machine learning models were constructed, the logistic classification model has the best overall performance, and the support vector machine has the weakest overall performance. The validation set has an AUC of 0.86 for the model. In this study, five classification models based on machine learning algorithms were successfully constructed. Combining the evaluation metrics of each model performance, we concluded that the logistic classification model had the best performance capability in our study. However, since this study is a single-center small sample size study, some metabolic features of PCOS may be overlooked, and, as the validation set data in this study come from the same center as the modelling data, the validation results may have several limitations, so it is still necessary to expand the sample size and collect multicenter data to establish an external validation dataset to further improve the study.

Authors

  • Cheng Tong
    The First Affiliated Hospital of Zhejiang Chinese Medical University (Zhejiang Provincial Hospital of Chinese Medicine), Hangzhou, 310006, Zhejiang, China.
  • Yue Wu
    Key Laboratory of Luminescence and Real-Time Analytical Chemistry (Ministry of Education), College of Pharmaceutical Sciences, Southwest University, Chongqing 400716, China.
  • Zhenchao Zhuang
    Adicon Clinical Laboratories, Hangzhou, 310023, Zhejiang, China. zhuangzzc2015@163.com.
  • Ying Yu
    School of Chemistry and Environment, Guangzhou Key Laboratory of Analytical Chemistry for Biomedicine, South China Normal University, Guangzhou 510006, PR China. Electronic address: yuyhs@scnu.edu.cn.