Reproducibility standards for machine learning in the life sciences.

Journal: Nature methods
Published Date:

Abstract

To make machine learning analyses in the life sciences more computationally reproducible, we propose standards based on data, model, and code publication, programming best practices, and workflow automation. By meeting these standards, the community of researchers applying machine learning methods in the life sciences can ensure that their analyses are worthy of trust.

Authors

  • Benjamin J Heil
    Department of Computer Science.
  • Michael M Hoffman
    Princess Margaret Cancer Centre, Toronto, Ontario, Canada.
  • Florian Markowetz
    Cancer Research UK Cambridge Centre, University of Cambridge, Cambridge CB2 0RE, UK; Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE, UK. Electronic address: florian.markowetz@cruk.cam.ac.uk.
  • Su-In Lee
    Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington.
  • Casey S Greene
    Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, United States; Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, United States; Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Perelman School of Medicine, University of Pennsylvania, United States. Electronic address: csgreene@upenn.edu.
  • Stephanie C Hicks
    Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA. shicks19@jhu.edu.