Research on memory failure prediction based on ensemble learning.

Journal: PloS one

PMID: 40267123

Abstract

Timely prediction of memory failures is crucial for the stable operation of data centers. However, existing methods often rely on a single classifier, which can lead to inaccurate or unstable predictions. To address this, we propose a new ensemble model for predicting CE-driven memory failures, where failures occur due to a surge of correctable errors (CEs) in memory, causing server downtime. Our model combines several strong-performing classifiers, such as Random Forest, LightGBM, and XGBoost, and assigns different weights to each based on its performance. By optimizing the decision-making process, the model improves prediction accuracy. We validate the model using in-memory data from Alibaba's data center, and the results show an accuracy of over 84%, outperforming existing single and dual-classifier models, further confirming its excellent predictive performance.

Authors

Peng Zhang

Key Laboratory of Macromolecular Science of Shaanxi Province, School of Chemistry & Chemical Engineering, Shaanxi Normal University, Xi'an, Shaanxi 710062, China.
Jialiang Zhang

School of Electronic Information Engineering, Xi'an Technological University, Xi'an, Shaanxi, China.
Yi Li

Wuhan Zoncare Bio-Medical Electronics Co., Ltd, Wuhan, China.

Keywords

Algorithms Ensemble Learning Humans Machine Learning Memory Models, Theoretical

External Resources

View on PubMed Access via DOI PubMed (40267123)

Research on memory failure prediction based on ensemble learning.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals