Pantheon: Personalized Multi-objective Ensemble Sort via Iterative Pareto Policy Optimization
Journal:
arXiv
Published Date:
May 20, 2025
Abstract
In this paper, we provide our milestone ensemble sort work and the first-hand
practical experience, Pantheon, which transforms ensemble sorting from a
"human-curated art" to a "machine-optimized science". Compared with
formulation-based ensemble sort, our Pantheon has the following advantages: (1)
Personalized Joint Training: our Pantheon is jointly trained with the real-time
ranking model, which could capture ever-changing user personalized interests
accurately. (2) Representation inheritance: instead of the highly compressed
Pxtrs, our Pantheon utilizes the fine-grained hidden-states as model input,
which could benefit from the Ranking model to enhance our model complexity.
Meanwhile, to reach a balanced multi-objective ensemble sort, we further devise
an \textbf{iterative Pareto policy optimization} (IPPO) strategy to consider
the multiple objectives at the same time. To our knowledge, this paper is the
first work to replace the entire formulation-based ensemble sort in industry
RecSys, which was fully deployed at Kuaishou live-streaming services, serving
400 Million users daily.