Evaluation methodology for deep learning imputation models.

Journal: Experimental biology and medicine (Maywood, N.J.)
Published Date:

Abstract

There is growing interest in imputing missing data in tabular datasets using deep learning. Existing deep learning-based imputation models have been commonly evaluated using root mean square error (RMSE) as the predictive accuracy metric. In this article, we investigate the limitations of assessing deep learning-based imputation models by conducting a comparative analysis between RMSE and alternative metrics in the statistical literature including qualitative, predictive accuracy, statistical distance, and descriptive statistics. We design a new aggregated metric, called (RL), to evaluate deep learning-based imputation models. We also develop and evaluate a novel imputation evaluation methodology based on RL. To minimize model and dataset biases, we use a regression imputation model and two different deep learning imputation models: denoising autoencoders and generative adversarial nets. We also use two tabular datasets from different industry sectors: health care and financial. Our results show that the proposed methodology is effective in evaluating multiple properties of the deep learning-based imputation model's reconstruction performance.

Authors

  • Omar Boursalie
    Department of Electrical and Computer Engineering, Toronto Metropolitan University, ON M5B 2K3, Canada.
  • Reza Samavi
    Department of Electrical and Computer Engineering, Toronto Metropolitan University, ON M5B 2K3, Canada.
  • Thomas E Doyle
    Vector Institute, Toronto, ON M5G 1M1, Canada.