Label-shift robust federated feature screening for high-dimensional classification
Journal:
arXiv
Published Date:
May 31, 2025
Abstract
Distributed and federated learning are important tools for high-dimensional
classification of large datasets. To reduce computational costs and overcome
the curse of dimensionality, feature screening plays a pivotal role in
eliminating irrelevant features during data preprocessing. However, data
heterogeneity, particularly label shifting across different clients, presents
significant challenges for feature screening. This paper introduces a general
framework that unifies existing screening methods and proposes a novel utility,
label-shift robust federated feature screening (LR-FFS), along with its
federated estimation procedure. The framework facilitates a uniform analysis of
methods and systematically characterizes their behaviors under label shift
conditions. Building upon this framework, LR-FFS leverages conditional
distribution functions and expectations to address label shift without adding
computational burdens and remains robust against model misspecification and
outliers. Additionally, the federated procedure ensures computational
efficiency and privacy protection while maintaining screening effectiveness
comparable to centralized processing. We also provide a false discovery rate
(FDR) control method for federated feature screening. Experimental results and
theoretical analyses demonstrate LR-FFS's superior performance across diverse
client environments, including those with varying class distributions, sample
sizes, and missing categorical data.