Pantheon: Personalized Multi-objective Ensemble Sort via Iterative Pareto Policy Optimization

Journal: arXiv
Published Date:

Abstract

In this paper, we provide our milestone ensemble sort work and the first-hand practical experience, Pantheon, which transforms ensemble sorting from a "human-curated art" to a "machine-optimized science". Compared with formulation-based ensemble sort, our Pantheon has the following advantages: (1) Personalized Joint Training: our Pantheon is jointly trained with the real-time ranking model, which could capture ever-changing user personalized interests accurately. (2) Representation inheritance: instead of the highly compressed Pxtrs, our Pantheon utilizes the fine-grained hidden-states as model input, which could benefit from the Ranking model to enhance our model complexity. Meanwhile, to reach a balanced multi-objective ensemble sort, we further devise an \textbf{iterative Pareto policy optimization} (IPPO) strategy to consider the multiple objectives at the same time. To our knowledge, this paper is the first work to replace the entire formulation-based ensemble sort in industry RecSys, which was fully deployed at Kuaishou live-streaming services, serving 400 Million users daily.

Authors

  • Jiangxia Cao
  • Pengbo Xu
  • Yin Cheng
  • Kaiwei Guo
  • Jian Tang
  • Shijun Wang
  • Dewei Leng
  • Shuang Yang
  • Zhaojie Liu
  • Yanan Niu
  • Guorui Zhou
  • Kun Gai