Machine learning-based prediction of hearing loss: Findings of the US NHANES from 2003 to 2018.

Journal: Hearing research
PMID:

Abstract

The prevalence of hearing loss (HL) has emerged as an escalating public health concern globally. The objective of this study was to leverage data from the National Health and Nutritional Examination Survey (NHANES) to develop an interpretable predictive machine learning (ML) model for HL. In accordance with the established inclusion and exclusion criteria, a total of 2814 participants were randomly assigned to one of two distinct groups for the training and validation of the predictive models. We identified the most significant variables using Recursive Feature Elimination and constructed a HL prediction model through various ML models. The generalization ability of the models was evaluated via 10-fold cross-validation. Eight different models were utilized to develop the optimal prediction model for HL. Subsequently, three interpretable methods, Feature importance analysis, Generalized linear model (GLM) and Restricted cubic spline (RCS) were integrated into a pipeline and embedded in ML for model interpretation. In this study, the Random Forest (RF) exhibited superior performance across all evaluation metrics after balancing the data using the Synthetic Minority Oversampling Technique (SMOTE), particularly excelling in AUC, PR-AUC and F1 score. Feature importance analysis uncovered significant correlations between HL and top 10 features, including age, blood lead (Pb) level, urine thallium (Tl) level, BMI, total energy, urine antimon (Sb) level, vitamin E intake, urine cobalt (Co) level, calcium intake and urine cesium (Cs) level. Moreover, both univariate and multivariate GLMs identified blood Pb [OR (95 % CI):1.169 (1.037,1.311)] and vitamin E intake [OR (95 % CI):0.776 (0.641,0.928)] as the main features associated with HL. The RCS analysis further revealed that increased blood Pb level and decreased vitamin E intake correspond to a proportional rise in the anticipated risk of HL after adjusted by confounders. Our ML models identify key factors that, if validated by future studies, will have important implications for hearing conservation. Furthermore, these ML-based point-of-care prediction models will help overcome barriers to hearing healthcare and enable the efficient allocation of resources by accurately identifying individuals who are in dire need of hearing assessment.

Authors

  • Yi Mi
    Department of Occupational Health & Toxicology, School of Public Health, Fudan University, Shanghai 200032, PR China.
  • Pin Sun
    Department of Cardiac Ultrasound, The Affiliated Hospital of Qingdao University, Qingdao, Shandong 266003, China.