Comparison between logistic regression and machine learning algorithms on prediction of noise-induced hearing loss and investigation of SNP loci.
Journal:
Scientific reports
PMID:
40316545
Abstract
To compare the comprehensive performance of conventional logistic regression (LR) and seven machine learning (ML) algorithms in Noise-Induced Hearing Loss (NIHL) prediction, and to investigate the single nucleotide polymorphism (SNP) loci significantly associated with the occurrence and progression of NIHL. A total of 1,338 noise-exposed workers from 52 enterprises in Jiangsu Province were included in this study. 88 SNP loci involving multiple genes related to noise exposure and hearing loss were detected. LR and multiple ML algorithms were employed to establish the NIHL prediction model with accuracy, recall, precision, F-score, R and AUC as performance indicators. Compared to conventional LR, the evaluated ML models Generalized Regression Neural Network (GRNN), Probabilistic Neural Network (PNN), Genetic Algorithm-Random Forests (GA-RF) demonstrate superior performance and were considered to be the optimal models for processing large-scale SNP loci dataset. The SNP loci screened by these models are pivotal in the process of NIHL prediction, which further improves the prediction accuracy of the model. These findings open new possibilities for accurate prediction of NIHL based on SNP locus screening in the future, and provide a more scientific basis for decision-making in occupational health management.