A machine-learning-driven data labeling pipeline for scientific analysis in .

Journal: Journal of applied crystallography
Published Date:

Abstract

This study introduces a novel labeling pipeline to accelerate the labeling process of scientific data sets by using artificial intelligence (AI)-guided tagging techniques. This pipeline includes a set of interconnected web-based graphical user interfaces (GUIs), where and enable the preparation of machine learning (ML) models for data reduction and classification, respectively, while is used for label assignment. Throughout this pipeline, data can be accessed through a direct connection to a file system or through for access through Hypertext Transfer Protocol (HTTP). Our experimental results present three use cases where this labeling pipeline has been instrumental for the study of large X-ray scattering data sets in the area of pattern recognition, the remote analysis of resonant soft X-ray scattering data and the fine-tuning process of foundation models. These use cases highlight the labeling capabilities of this pipeline, including the ability to label large data sets in a short period of time, to perform remote data analysis while minimizing data movement and to enhance the fine-tuning process of complex ML models with human involvement.

Authors

  • Tanny Chavez
    Advanced Light Source, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
  • Zhuowen Zhao
    Advanced Light Source, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
  • Runbo Jiang
    Advanced Light Source, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
  • Wiebke Koepp
    Advanced Light Source, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
  • Dylan McReynolds
    Advanced Light Source, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
  • Petrus H Zwart
    Center for Advanced Mathematics for Energy Research Applications, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
  • Daniel B Allan
    National Synchrotron Light Source II Brookhaven National Laboratory,Upton NY 11973 USA.
  • Eliot H Gann
    National Synchrotron Light Source II Brookhaven National Laboratory,Upton NY 11973 USA.
  • Nicholas Schwarz
    X-ray Science Division, Advanced Photon Source, Argonne National Laboratory, 9700 South Cass Avenue, Lemont, IL 60439, USA.
  • Daniela Ushizima
    Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA. dushizima@lbl.gov.
  • Edward S Barnard
    The Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
  • Apurva Mehta
    Linac Coherent Light Source SLAC National Accelerator Laboratory,Menlo Park CA 94025 USA.
  • Subramanian Sankaranarayanan
    Center for Nanoscale Materials, Argonne National Laboratory, Lemont, IL 60439, USA.
  • Alexander Hexemer
    Advanced Light Source, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.

Keywords

No keywords available for this article.