MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production

Journal: arXiv

Published Date: May 16, 2025

Abstract

We present MegaScale-MoE, a production system tailored for the efficient training of large-scale mixture-of-experts (MoE) models. MoE emerges as a promising architecture to scale large language models (LLMs) to unprecedented sizes, thereby enhancing model performance. However, existing MoE training systems experience a degradation in training efficiency, exacerbated by the escalating scale of MoE models and the continuous evolution of hardware. Recognizing the pivotal role of efficient communication in enhancing MoE training, MegaScale-MoE customizes communication-efficient parallelism strategies for attention and FFNs in each MoE layer and adopts a holistic approach to overlap communication with computation at both inter- and intra-operator levels. Additionally, MegaScale-MoE applies communication compression with adjusted communication patterns to lower precision, further improving training efficiency. When training a 352B MoE model on 1,440 NVIDIA Hopper GPUs, MegaScale-MoE achieves a training throughput of 1.41M tokens/s, improving the efficiency by 1.88$\times$ compared to Megatron-LM. We share our operational experience in accelerating MoE training and hope that by offering our insights in system design, this work will motivate future research in MoE systems.

Authors

Chao Jin
Ziheng Jiang
Zhihao Bai
Zheng Zhong
Juncai Liu
Xiang Li
Ningxin Zheng
Xi Wang
Cong Xie
Qi Huang
Wen Heng
Yiyuan Ma
Wenlei Bao
Size Zheng
Yanghua Peng
Haibin Lin
Xuanzhe Liu
Xin Jin
Xin Liu

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2505.11432v2)

MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production

Abstract

Authors

Categories

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals