Revisiting CPUs for Protein Folding: Xeon-Based Acceleration of AlphaFold2
Journal:
bioRxiv
Published Date:
May 29, 2026
Abstract
Protein structure prediction via AlphaFold2 has revolutionized drug discovery, yet its end-to-end execution remains computationally intensive. While GPUs are traditionally favored for deep learning, the AlphaFold2 algorithm consists of heterogeneous phases --preprocessing with sparse database searches and model inference with low-arithmetic-intensity attention modules -- that present unique architectural challenges. In this work, we address these bottlenecks by introducing Open-Omics-AlphaFold2, a highly optimized implementation for Intel Xeon CPU. By leveraging the CPU's versatility in handling both sparse preprocessing algorithms and dense matrix operations via Intel Advanced Matrix Extensions (AMX), we accelerate the entire pipeline end-to-end. Our optimization strategy employs multi-level parallelism -- spanning multiprocessing, multi-threading, and vectorization -- alongside cache-aware tiling and operator fusion. Our results demonstrate that, on a Xeon CPU, Open-Omics-AlphaFold2 achieves 2 - 7.58x speedup for preprocessing and 19.8 - 29.2x speedup for model inference over baseline Deepmind-AlphaFold2 . Moreover, for a proteome of 391 proteins, Open-Omics AlphaFold2 running on a dual-socket Intel Xeon 6980P system achieves a remarkable 76% higher throughput over the state-of-the-art GPU accelerated solution, FastFold, running on a single-socket Intel Xeon 6980P CPU with an NVIDIA H100 offload.