Optimizing Distributed Training Approaches for Scaling Neural Networks
Journal:
arXiv
Published Date:
Mar 29, 2025
Abstract
This paper presents a comparative analysis of distributed training strategies
for large-scale neural networks, focusing on data parallelism, model
parallelism, and hybrid approaches. We evaluate these strategies on image
classification tasks using the CIFAR-100 dataset, measuring training time,
convergence rate, and model accuracy. Our experimental results demonstrate that
hybrid parallelism achieves a 3.2x speedup compared to single-device training
while maintaining comparable accuracy. We propose an adaptive scheduling
algorithm that dynamically switches between parallelism strategies based on
network characteristics and available computational resources, resulting in an
additional 18% improvement in training efficiency.