Combining Supervised and Unsupervised Machine Learning Methods for Phenotypic Functional Genomics Screening.

Journal: SLAS discovery : advancing life sciences R & D
Published Date:

Abstract

There has been an increase in the use of machine learning and artificial intelligence (AI) for the analysis of image-based cellular screens. The accuracy of these analyses, however, is greatly dependent on the quality of the training sets used for building the machine learning models. We propose that unsupervised exploratory methods should first be applied to the data set to gain a better insight into the quality of the data. This improves the selection and labeling of data for creating training sets before the application of machine learning. We demonstrate this using a high-content genome-wide small interfering RNA screen. We perform an unsupervised exploratory data analysis to facilitate the identification of four robust phenotypes, which we subsequently use as a training set for building a high-quality random forest machine learning model to differentiate four phenotypes with an accuracy of 91.1% and a kappa of 0.85. Our approach enhanced our ability to extract new knowledge from the screen when compared with the use of unsupervised methods alone.

Authors

  • Wienand A Omta
    Department of Cell Biology, Centre for Molecular Medicine, UMC Utrecht, Utrecht, The Netherlands.
  • Roy G van Heesbeen
    Department of Cell Biology, NKI-AVL, Amsterdam, Noord-Holland, The Netherlands.
  • Ian Shen
    Department of Information and Computing Sciences, Utrecht University, Utrecht, The Netherlands.
  • Jacob de Nobel
    Department of Cell Biology, Centre for Molecular Medicine, UMC Utrecht, Utrecht, The Netherlands.
  • Desmond Robers
    Department of Cell Biology, Centre for Molecular Medicine, UMC Utrecht, Utrecht, The Netherlands.
  • Lieke M van der Velden
    Department of Cell Biology, Centre for Molecular Medicine, UMC Utrecht, Utrecht, The Netherlands.
  • RenĂ© H Medema
    Department of Cell Biology, NKI-AVL, Amsterdam, Noord-Holland, The Netherlands.
  • Arno P J M Siebes
    Department of Information and Computing Sciences, Utrecht University, Utrecht, The Netherlands.
  • Ad J Feelders
    Department of Information and Computing Sciences, Utrecht University, Utrecht, The Netherlands.
  • Sjaak Brinkkemper
    Department of Information and Computing Sciences, Utrecht University, Utrecht, The Netherlands.
  • Judith S Klumperman
    Department of Cell Biology, Centre for Molecular Medicine, UMC Utrecht, Utrecht, The Netherlands.
  • Marco RenĂ© Spruit
    Department of Information and Computing Sciences, Utrecht University, Utrecht, The Netherlands.
  • Matthieu J S Brinkhuis
    Department of Information and Computing Sciences, Utrecht University, Utrecht, The Netherlands.
  • David A Egan
    Core Life Analytics B.V., Utrecht, The Netherlands.