Double-Stage Feature-Level Clustering-Based Mixture of Experts Framework
Journal:
arXiv
Published Date:
Mar 12, 2025
Abstract
The Mixture-of-Experts (MoE) model has succeeded in deep learning (DL).
However, its complex architecture and advantages over dense models in image
classification remain unclear. In previous studies, MoE performance has often
been affected by noise and outliers in the input space. Some approaches
incorporate input clustering for training MoE models, but most clustering
algorithms lack access to labeled data, limiting their effectiveness. This
paper introduces the Double-stage Feature-level Clustering and
Pseudo-labeling-based Mixture of Experts (DFCP-MoE) framework, which consists
of input feature extraction, feature-level clustering, and a computationally
efficient pseudo-labeling strategy. This approach reduces the impact of noise
and outliers while leveraging a small subset of labeled data to label a large
portion of unlabeled inputs. We propose a conditional end-to-end joint training
method that improves expert specialization by training the MoE model on
well-labeled, clustered inputs. Unlike traditional MoE and dense models, the
DFCP-MoE framework effectively captures input space diversity, leading to
competitive inference results. We validate our approach on three benchmark
datasets for multi-class classification tasks.