DREAMER: a computational framework to evaluate readiness of datasets for machine learning.

Journal: BMC medical informatics and decision making

Published Date: Jun 4, 2024

Abstract

BACKGROUND: Machine learning (ML) has emerged as the predominant computational paradigm for analyzing large-scale datasets across diverse domains. The assessment of dataset quality stands as a pivotal precursor to the successful deployment of ML models. In this study, we introduce DREAMER (Data REAdiness for MachinE learning Research), an algorithmic framework leveraging supervised and unsupervised machine learning techniques to autonomously evaluate the suitability of tabular datasets for ML model development. DREAMER is openly accessible as a tool on GitHub and Docker, facilitating its adoption and further refinement within the research community..

Authors

Meysam Ahangaran

Department of Medicine, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA.
Hanzhi Zhu

Department of Medicine, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA.
Ruihui Li

Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China.
Lingkai Yin

Department of Medicine, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA.
Joseph Jang

Department of Medicine, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA.
Arnav P Chaudhry

Department of Medicine, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA.
Lindsay A Farrer

Department of Medicine (Biomedical Genetics), Boston University School of Medicine, Boston, MA 02118, USA.
Rhoda Au

Boston University School of Medicine, rhodaau@bu.edu.
Vijaya B Kolachalama

1Section of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, Boston, MA 02118 USA.

Keywords

Algorithms Datasets as Topic Humans Machine Learning Software Supervised Machine Learning Unsupervised Machine Learning

External Resources

View on PubMed Access via DOI PubMed (38831432)

DREAMER: a computational framework to evaluate readiness of datasets for machine learning.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals