Scalable Genomic Context Analysis with GCsnap2 on HPC Clusters
Journal:
arXiv
Published Date:
May 4, 2025
Abstract
GCsnap2 Cluster is a scalable, high performance tool for genomic context
analysis, developed to overcome the limitations of its predecessor, GCsnap1
Desktop. Leveraging distributed computing with mpi4py[.]futures, GCsnap2
Cluster achieved a 22x improvement in execution time and can now perform
genomic context analysis for hundreds of thousands of input sequences in HPC
clusters. Its modular architecture enables the creation of task-specific
workflows and flexible deployment in various computational environments, making
it well suited for bioinformatics studies of large-scale datasets. This work
highlights the potential for applying similar approaches to solve scalability
challenges in other scientific domains that rely on large-scale data analysis
pipelines.