Scalable Genomic Context Analysis with GCsnap2 on HPC Clusters

Journal: arXiv
Published Date:

Abstract

GCsnap2 Cluster is a scalable, high performance tool for genomic context analysis, developed to overcome the limitations of its predecessor, GCsnap1 Desktop. Leveraging distributed computing with mpi4py[.]futures, GCsnap2 Cluster achieved a 22x improvement in execution time and can now perform genomic context analysis for hundreds of thousands of input sequences in HPC clusters. Its modular architecture enables the creation of task-specific workflows and flexible deployment in various computational environments, making it well suited for bioinformatics studies of large-scale datasets. This work highlights the potential for applying similar approaches to solve scalability challenges in other scientific domains that rely on large-scale data analysis pipelines.

Authors

  • Reto Krummenacher
  • Osman Seckin Simsek
  • Michèle Leemann
  • Leila T. Alexander
  • Torsten Schwede
  • Florina M. Ciorba
  • Joana Pereira