Optimization of Imbalanced and Multidimensional Learning Under Bayes Minimum Risk and Savings Measure.

Journal: Big data
Published Date:

Abstract

The full potential of data analysis is crippled by imbalanced and high-dimensional data, which makes these topics significantly important. Consequently, substantial research efforts have been directed to obtain dimension reduction and resolve data imbalance, especially in the context of fraud detection analysis. This work aims to investigate the effectiveness of hybrid learning methods for alleviating the class imbalance and integrating dimensionality reduction techniques. In this regard, the current study examines different classification combinations to achieve optimal savings and improve classification performance. Against this background, several well-known machine learning models are selected such as logistic regression, random forest, CatBoost (CB), and XGBoost. These models are constructed and optimized based on Bayes minimum risk (BMR) associated with the oversampling method synthetic minority oversampling technique (SMOTE) and different feature selection (FS) techniques, both univariate and multivariate. To investigate the performance of the proposed approach, different possible scenarios are analyzed both with and without balancing, with and without FS, and optimization using BMR. With a major insight about the best method to use, BMR shows a good optimization when used with SMOTE, symmetrical uncertainty for FS, and CB as a boosted classifier, principally in terms of F1 score and savings metrics.

Authors

  • Fatima El Barakaz
    Laroseri Laboratory, Faculty of Sciences, Chouaib Doukkali University, El Jadida, Morocco.
  • Omar Boutkhoum
    Laroseri Laboratory, Faculty of Sciences, Chouaib Doukkali University, El Jadida, Morocco.
  • Mohamed Hanine
    Department of Telecommunications, Networks and Informatics, LTI Laboratory, ENSA, Chouaib Doukkali University, El Jadida, Morocco.
  • Abdelmajid El Moutaouakkil
    Laroseri Laboratory, Faculty of Sciences, Chouaib Doukkali University, El Jadida, Morocco.
  • Furqan Rustam
    Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan.
  • Sadia Din
    Department of Computer Engineering, Kyungpook National University, Daegu, South Korea.
  • Imran Ashraf
    Information and Communication Engineering, Yeungnam University, Gyeongsan si, Daegu, South Korea.