PrivateBoost: Rethinking Federated Learning for Patient-Owned Medical Data: Learning from Single Records at Scale
Journal:
medRxiv
Published Date:
Apr 17, 2026
Abstract
The translation of artificial intelligence into clinical practice depends, in large part, on access to data that patients are understandably reluctant to surrender. Federated learning promises to resolve this tension by training models without centralising raw records, yet virtually all deployed systems assume that each participating site holds hundreds or thousands of samples -- a condition that hospitals and research consortia may satisfy, but individual patients never will. Here, we show that a fundamentally different regime is both theoretically tractable and practically viable: one in which each participant contributes a single diagnostic record, retains it entirely on their own device, and never communicates with any other participant. We introduce PrivateBoost, a federated gradient boosting framework in which clients distribute Shamir secret shares to a small set of independent shareholders, enabling an aggregator to reconstruct only the aggregate statistics required for tree construction while remaining information-theoretically blind to individual contributions. Crucially, this approach exploits a deep compatibility between gradient boosting and secret sharing: model training depends only on additive aggregates, which align naturally with secure summation, eliminating the need for coordination while maintaining efficiency. A path-hiding extension further prevents shareholders from inferring any patients traversal of the decision tree, eliminating a side channel that standard protocols leave exposed. Evaluated across four medical datasets encompassing up to 70,692 single-sample clients, PrivateBoost matches centralised XGBoost with identical AUC-ROC (0.818), tolerates up to 80% client dropout per round, and requires fewer than 600 KB of communication per patient over a complete training run -- comfortably within the constraints of a mobile device on a standard data plan. These results establish that privacy-preserving collective learning from patient-owned records is not merely a theoretical aspiration but an achievable engineering reality, opening a path towards medical AI that improves with every consenting patient without requiring any of them to share their data.