NMP-PaK: Near-Memory Processing Acceleration of Scalable De Novo Genome Assembly
Journal:
arXiv
Published Date:
May 12, 2025
Abstract
De novo assembly enables investigations of unknown genomes, paving the way
for personalized medicine and disease management. However, it faces immense
computational challenges arising from the excessive data volumes and
algorithmic complexity.
While state-of-the-art de novo assemblers utilize distributed systems for
extreme-scale genome assembly, they demand substantial computational and memory
resources. They also fail to address the inherent challenges of de novo
assembly, including a large memory footprint, memory-bound behavior, and
irregular data patterns stemming from complex, interdependent data structures.
Given these challenges, de novo assembly merits a custom hardware solution,
though existing approaches have not fully addressed the limitations.
We propose NMP-PaK, a hardware-software co-design that accelerates scalable
de novo genome assembly through near-memory processing (NMP). Our channel-level
NMP architecture addresses memory bottlenecks while providing sufficient
scratchpad space for processing elements. Customized processing elements
maximize parallelism while efficiently handling large data structures that are
both dynamic and interdependent. Software optimizations include customized
batch processing to reduce the memory footprint and hybrid CPU-NMP processing
to address hardware underutilization caused by irregular data patterns.
NMP-PaK conducts the same genome assembly while incurring a 14X smaller
memory footprint compared to the state-of-the-art de novo assembly. Moreover,
NMP-PaK delivers a 16X performance improvement over the CPU baseline, with a
2.4X reduction in memory operations. Consequently, NMP-PaK achieves 8.3X
greater throughput than state-of-the-art de novo assembly under the same
resource constraints, showcasing its superior computational efficiency.