Constructing machine learning-based risk prediction model for osteoarthritis in population aged 45 and above: NHANES 2011-2018.
Journal:
Scientific reports
PMID:
40275073
Abstract
Osteoarthritis is a widespread chronic joint disease, becoming increasingly prevalent, particularly among individuals over the age of 45. This condition causes joint pain and dysfunction, significantly disrupting daily life. The objective of this study is to develop an optimal machine learning model for predicting the risk of osteoarthritis in individuals aged 45 and older. This study utilized data from the National Health and Nutrition Examination Survey (NHANES) from 2011 to 2018, which included a total of 2980 individuals. The dataset was randomly divided into a training set (nā=ā2235) and a validation set (nā=ā745). Five machine learning algorithms were employed to develop the predictive model for osteoarthritis. The SHapley Additive exPlanation (SHAP) method was used to interpret the machine learning algorithms and identify the most significant features for predicting outcomes. The study involved 2980 participants and focused on predicting the probability of osteoarthritis occurrence using machine learning algorithms. Five algorithms were employed, analyzing 24 features from an average 60-year-old cohort, with 605 osteoarthritis diagnoses. After performing Recursive Feature Elimination (RFE) to select 20 features, the CatBoost model achieved an AUC of 0.8109 and an accuracy rate of 0.7315, making it the most efficient model. The most influential factors in the predictions were Gender, Age, BMI, Waist Circumference, and Race. This study demonstrates that the CatBoost model with 20 features can effectively predict the occurrence of osteoarthritis. This accurate prediction model can help inform early interventions and patient management strategies, potentially improving patient prognosis. Further research will focus on enhancing the model performance, such as incorporating additional relevant features or refining existing ones. Additionally, validating the model in more diverse patient populations, and investigating its potential for real-time implementation in clinical settings would further increase the study's impact and facilitate its translation into clinical practice.