A novel seven-tier framework for the classification of MEFV missense variants using adaptive and rigid classifiers.
Journal:
Scientific reports
PMID:
40090944
Abstract
There is a great discrepancy between the clinical categorization of MEFV gene variants and in silico tool predictions. In this study, we developed a seven-tier classification system for MEFV missense variants of unknown significance and recommended a generalized pipeline for other gene classifications. We extracted 12,017 human MEFV gene variants from the Ensembl database. After extraction, we detected 6034 missense variants. In the next step, we selected 42 in silico tools for our classification model. We determined the optimal value via the scores from three in silico tools. For the implementation of machine learning methods, we used two bagging methods and two boosting methods. After predicting known variants, we applied our model to 5507 variants of unknown significance. In the final stage, we applied the developed framework to the entire dataset to rigorously evaluate its classification performance and validate its potential clinical utility. The XGBoost model achieved the highest accuracy at 0.9882 (± 0.0295), followed by Extremely Randomized Trees (0.9835 ± 0.0335), Random Forest (0.9788 ± 0.0158), and AdaBoost (0.9671 ± 0.0815). Following the refinement of the dataset and the introduction of a novel classification and clustering methodology, the proportion of known variants increased from 6.9 to 29.4%, marking a 4.3-fold relative improvement. Furthermore, we identified two novel hotspot regions and one tolerant site, offering valuable insights into the functional structure of the pyrin protein. Rigid and adaptive classifiers offer an innovative framework for VOUS classification, integrating a grayscale interpretation system with cutting-edge in silico tools and machine learning algorithms. This approach not only improves the accuracy of MEFV gene variant classification but also identifies new hotspot regions for functional studies, paving the way for scalable applications to other genes and might contribute to advancing precision genomic medicine in the future.