Parallel FFTW on RISC-V: A Comparative Study including OpenMP, MPI, and HPX
Journal:
arXiv
Published Date:
Jun 10, 2025
Abstract
Rapid advancements in RISC-V hardware development shift the focus from
low-level optimizations to higher-level parallelization. Recent RISC-V
processors, such as the SOPHON SG2042, have 64 cores. RISC-V processors with
core counts comparable to the SG2042, make efficient parallelization as crucial
for RISC-V as the more established processors such as x86-64. In this work, we
evaluate the parallel scaling of the widely used FFTW library on RISC-V for MPI
and OpenMP. We compare it to a 64-core AMD EPYC 7742 CPU side by side for
different types of FFTW planning. Additionally, we investigate the effect of
memory optimization on RISC-V in HPX-FFT, a parallel FFT library based on the
asynchronous many-task runtime HPX using an FFTW backend. We generally observe
a performance delta between the x86-64 and RISC-V chips of factor eight for
double-precision 2D FFT. Effective memory optimizations in HPX-FFT on x86-64 do
not translate to the RISC-V chip. FFTW with MPI shows good scaling up to 64
cores on x86-64 and RISC-V regardless of planning. In contrast, FFTW with
OpenMP requires measured planning on both architectures to achieve good scaling
up to 64 cores. The results of our study mark an early step on the journey to
large-scale parallel applications running on RISC-V.