TEM virus images: Benchmark dataset and deep learning classification.

Journal: Computer methods and programs in biomedicine
Published Date:

Abstract

BACKGROUND AND OBJECTIVE: To achieve the full potential of deep learning (DL) models, such as understanding the interplay between model (size), training strategy, and amount of training data, researchers and developers need access to new dedicated image datasets; i.e., annotated collections of images representing real-world problems with all their variations, complexity, limitations, and noise. Here, we present, describe and make freely available an annotated transmission electron microscopy (TEM) image dataset. It constitutes an interesting challenge for many practical applications in virology and epidemiology; e.g., virus detection, segmentation, classification, and novelty detection. We also present benchmarking results for virus detection and recognition using some of the top-performing (large and small) networks as well as a handcrafted very small network. We compare and evaluate transfer learning and training from scratch hypothesizing that with a limited dataset, transfer learning is crucial for good performance of a large network whereas our handcrafted small network performs relatively well when training from scratch. This is one step towards understanding how much training data is needed for a given task.

Authors

  • Damian J Matuszewski
    Department of Information Technology, Uppsala University, Uppsala, Sweden. Electronic address: damian.matuszewski@it.uu.se.
  • Ida-Maria Sintorn
    Centre for Image Analysis, Uppsala University, Uppsala, 75124, Sweden.