BladeDISC++: Memory Optimizations Based On Symbolic Shape
Journal:
arXiv
Published Date:
Dec 22, 2024
Abstract
Recent deep learning workloads exhibit dynamic characteristics, leading to
the rising adoption of dynamic shape compilers. These compilers can generate
efficient kernels for dynamic shape graphs characterized by a fixed graph
topology and uncertain tensor shapes. However, memory optimization, although
particularly crucial in this large model era, remains relatively underexplored
for dynamic shape graphs. The fundamental challenge lies in the lack of precise
tensor shapes which are essential in conventional methods such as operation
scheduling(op scheduling) and rematerialization. To address this challenge, we
propose op scheduling and rematerialization approaches based on symbolic shapes
and developed BladeDISC++. Besides, since rematerialization decisions cannot be
made solely at compile time when tensor shapes are unknown, BladeDISC++ employs
a compilation-runtime combined strategy to optimally address shape dynamics.
Evaluations indicate that BladeDISC++ effectively reduces memory usage for
dynamic shape graphs, achieving memory consumption comparable to optimizations
using precise shapes, thereby promoting the broader adoption of dynamic shape
compilers.