Cross-Project Defect Prediction Based on Two-Phase Feature Importance Amplification.

Journal: Computational intelligence and neuroscience
Published Date:

Abstract

As the typical application of computational intelligence in software engineering, cross-project defect prediction (CPDP) uses labeled data from other projects (source projects) for building models to predict the defects in the current projects (target projects), helping testers quickly locate the defective modules. But class imbalance and different data distribution among projects make CPDP a challenging topic. To address the above two problems, we propose a two-phase feature importance amplification (TFIA) CPDP model in this paper which can solve these two problems from domain adaptation phase and classification phase. In the domain adaptation phase, the differences in data distribution among projects are reduced by filtering both source and target projects, and the correlation-based feature selection with greedy best-first search amplifies the importance of features with strong feature-class correlation. In the classification phase, Random Forest works as the classifier to further amplify the importance of highly correlated features and establish a model which is sensitive to highly correlated features. We conducted both ablation experiments and comparison experiments on the widely used AEEEM database. Experimental results show that TFIA can yield significant improvement on CPDP. And the performance of TFIA CPDP model in all experiments is stable and efficient, which lays a solid foundation for its further application in practical engineering.

Authors

  • Ying Xing
    Automation School, Beijing University of Posts and Telecommunications, Beijing 100876, China.
  • Wanting Lin
    School of Artificial Intelligence, Beijing University of Posts and Telecommunications, 100876 Beijing, China.
  • Xueyan Lin
    School of Artificial Intelligence, Beijing University of Posts and Telecommunications, 100876 Beijing, China.
  • Bin Yang
    School of Control Science and Engineering, Dalian University of Technology, Dalian 116024, PR China. Electronic address: yangbin@dlut.edu.cn.
  • Zhou Tan
    School of Artificial Intelligence, Beijing University of Posts and Telecommunications, 100876 Beijing, China.