MODRL-TA:A Multi-Objective Deep Reinforcement Learning Framework for Traffic Allocation in E-Commerce Search
Journal:
arXiv
Published Date:
Jul 22, 2024
Abstract
Traffic allocation is a process of redistributing natural traffic to products
by adjusting their positions in the post-search phase, aimed at effectively
fostering merchant growth, precisely meeting customer demands, and ensuring the
maximization of interests across various parties within e-commerce platforms.
Existing methods based on learning to rank neglect the long-term value of
traffic allocation, whereas approaches of reinforcement learning suffer from
balancing multiple objectives and the difficulties of cold starts within
realworld data environments. To address the aforementioned issues, this paper
propose a multi-objective deep reinforcement learning framework consisting of
multi-objective Q-learning (MOQ), a decision fusion algorithm (DFM) based on
the cross-entropy method(CEM), and a progressive data augmentation system(PDA).
Specifically. MOQ constructs ensemble RL models, each dedicated to an
objective, such as click-through rate, conversion rate, etc. These models
individually determine the position of items as actions, aiming to estimate the
long-term value of multiple objectives from an individual perspective. Then we
employ DFM to dynamically adjust weights among objectives to maximize long-term
value, addressing temporal dynamics in objective preferences in e-commerce
scenarios. Initially, PDA trained MOQ with simulated data from offline logs. As
experiments progressed, it strategically integrated real user interaction data,
ultimately replacing the simulated dataset to alleviate distributional shifts
and the cold start problem. Experimental results on real-world online
e-commerce systems demonstrate the significant improvements of MODRL-TA, and we
have successfully deployed MODRL-TA on an e-commerce search platform.