Muon-Accelerated Attention Distillation for Real-Time Edge Synthesis via Optimized Latent Diffusion
Journal:
arXiv
Published Date:
Apr 11, 2025
Abstract
Recent advances in visual synthesis have leveraged diffusion models and
attention mechanisms to achieve high-fidelity artistic style transfer and
photorealistic text-to-image generation. However, real-time deployment on edge
devices remains challenging due to computational and memory constraints. We
propose Muon-AD, a co-designed framework that integrates the Muon optimizer
with attention distillation for real-time edge synthesis. By eliminating
gradient conflicts through orthogonal parameter updates and dynamic pruning,
Muon-AD achieves 3.2 times faster convergence compared to Stable
Diffusion-TensorRT, while maintaining synthesis quality (15% lower FID, 4%
higher SSIM). Our framework reduces peak memory to 7GB on Jetson Orin and
enables 24FPS real-time generation through mixed-precision quantization and
curriculum learning. Extensive experiments on COCO-Stuff and ImageNet-Texture
demonstrate Muon-AD's Pareto-optimal efficiency-quality trade-offs. Here, we
show a 65% reduction in communication overhead during distributed training and
real-time 10s/image generation on edge GPUs. These advancements pave the way
for democratizing high-quality visual synthesis in resource-constrained
environments.