A novel two-stage feature selection method based on random forest and improved genetic algorithm for enhancing classification in machine learning.

Journal: Scientific reports

Published Date: May 14, 2025

Abstract

The data acquisition methods are becoming increasingly diverse and advanced, leading to higher data dimensions, blurred classification boundaries, and overfitting datasets, affecting machine learning models' accuracy. Many studies have sought to improve model performance through feature selection. However, a single feature selection method has incomplete, unstable, or time-consuming shortcomings. Combining the advantages of various feature selection methods can help overcome these defects. This paper proposes a two-stage feature selection method based on random forest and improved genetic algorithm. First, the importance scores of the random forest are calculated and ranked, and the features are preliminarily eliminated according to the scores, reducing the time complexity of the subsequent process. Then, the improved genetic algorithm is used to search for the global optimal feature subset further. This process introduces a multi-objective fitness function to guide the feature subset, minimizing the number of features in the subset while enhancing classification accuracy. This paper also adds an adaptive mechanism and evolution strategy to improve the loss of population diversity and degeneration in the later stages of iteration, thereby enhancing search efficiency. The experimental results on eight UCI datasets show that the proposed method significantly improves classification performance and has excellent feature selection capability.

Authors

Junyao Ding

School of Telecommunications Engineering, Xidian University, Xi'an, China.
Jianchao Du

School of Telecommunications Engineering, Xidian University, Xi'an, China.
Hejie Wang

School of Telecommunications Engineering, Xidian University, Xi'an, 710071, China.
Song Xiao

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (40369050)

A novel two-stage feature selection method based on random forest and improved genetic algorithm for enhancing classification in machine learning.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

A novel two-stage feature selection method based on random forest and improved genetic algorithm for enhancing classification in machine learning.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals