MicrographCleaner: A python package for cryo-EM micrograph cleaning using deep learning.

Journal: Journal of structural biology
Published Date:

Abstract

Cryo-EM Single Particle Analysis workflows require tens of thousands of high-quality particle projections to unveil the three-dimensional structure of macromolecules. Conventional methods for automatic particle picking tend to suffer from high false-positive rates, hampering the reconstruction process. One common cause of this problem is the presence of carbon and different types of high-contrast contaminations. In order to overcome this limitation, we have developed MicrographCleaner, a deep learning package designed to discriminate, in an automated fashion, between regions of micrographs which are suitable for particle picking, and those which are not. MicrographCleaner implements a U-net-like deep learning model trained on a manually curated dataset compiled from over five hundred micrographs. The benchmarking, carried out on approximately one hundred independent micrographs, shows that MicrographCleaner is a very efficient approach for micrograph preprocessing. MicrographCleaner (micrograph_cleaner_em) package is available at PyPI and Anaconda Cloud and also as a Scipion/Xmipp protocol. Source code is available at https://github.com/rsanchezgarc/micrograph_cleaner_em.

Authors

  • Ruben Sanchez-Garcia
    GN7 of the Spanish National Institute for Bioinformatics (INB), Biocomputing Unit, National Center of Biotechnology (CSIC), Instruct Image Processing Center, Madrid, Spain.
  • Joan Segura
    Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, San Diego, CA, United States.
  • David Maluenda
    National Center of Biotechnology (CSIC)/Instruct Image Processing Center, C/ Darwin n° 3, Campus of Cantoblanco, 28049 Madrid, Spain.
  • C O S Sorzano
    GN7 of the Spanish National Institute for Bioinformatics (INB), Biocomputing Unit, National Center of Biotechnology (CSIC), Instruct Image Processing Center, Madrid, Spain.
  • J M Carazo
    GN7 of the Spanish National Institute for Bioinformatics (INB), Biocomputing Unit, National Center of Biotechnology (CSIC), Instruct Image Processing Center, Madrid, Spain.