DREAMER: a computational framework to evaluate readiness of datasets for machine learning.

Journal: BMC medical informatics and decision making
Published Date:

Abstract

BACKGROUND: Machine learning (ML) has emerged as the predominant computational paradigm for analyzing large-scale datasets across diverse domains. The assessment of dataset quality stands as a pivotal precursor to the successful deployment of ML models. In this study, we introduce DREAMER (Data REAdiness for MachinE learning Research), an algorithmic framework leveraging supervised and unsupervised machine learning techniques to autonomously evaluate the suitability of tabular datasets for ML model development. DREAMER is openly accessible as a tool on GitHub and Docker, facilitating its adoption and further refinement within the research community..

Authors

  • Meysam Ahangaran
    Department of Medicine, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA.
  • Hanzhi Zhu
    Department of Medicine, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA.
  • Ruihui Li
    Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China.
  • Lingkai Yin
    Department of Medicine, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA.
  • Joseph Jang
    Department of Medicine, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA.
  • Arnav P Chaudhry
    Department of Medicine, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA.
  • Lindsay A Farrer
    Department of Medicine (Biomedical Genetics), Boston University School of Medicine, Boston, MA 02118, USA.
  • Rhoda Au
    Boston University School of Medicine, rhodaau@bu.edu.
  • Vijaya B Kolachalama
    1Section of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, Boston, MA 02118 USA.