Classifying Force Spectroscopy of DNA Pulling Measurements Using Supervised and Unsupervised Machine Learning Methods.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Dynamic force spectroscopy (DFS) measurements on biomolecules typically require classifying thousands of repeated force spectra prior to data analysis. Here, we study classification of atomic force microscope-based DFS measurements using machine-learning algorithms in order to automate selection of successful force curves. Notably, we collect a data set that has a testable positive signal using photoswitch-modified DNA before and after illumination with UV (365 nm) light. We generate a feature set consisting of six properties of force-distance curves to train supervised models and use principal component analysis (PCA) for an unsupervised model. For supervised classification, we train random forest models for binary and multiclass classification of force-distance curves. Random forest models predict successful pulls with an accuracy of 94% and classify them into five classes with an accuracy of 90%. The unsupervised method using Gaussian mixture models (GMM) reaches an accuracy of approximately 80% for binary classification.

Authors

  • Durmus U Karatay
    Department of Chemistry, University of Washington , Seattle, Washington 98195, United States.
  • Jie Zhang
    College of Physical Education and Health, Linyi University, Linyi, Shandong, China.
  • Jeffrey S Harrison
    Department of Chemistry, University of Washington , Seattle, Washington 98195, United States.
  • David S Ginger
    Department of Chemistry, University of Washington , Seattle, Washington 98195, United States.