Unleashing the Potential of Two-Tower Models: Diffusion-Based Cross-Interaction for Large-Scale Matching
Journal:
arXiv
Published Date:
Feb 28, 2025
Abstract
Two-tower models are widely adopted in the industrial-scale matching stage
across a broad range of application domains, such as content recommendations,
advertisement systems, and search engines. This model efficiently handles
large-scale candidate item screening by separating user and item
representations. However, the decoupling network also leads to a neglect of
potential information interaction between the user and item representations.
Current state-of-the-art (SOTA) approaches include adding a shallow fully
connected layer(i.e., COLD), which is limited by performance and can only be
used in the ranking stage. For performance considerations, another approach
attempts to capture historical positive interaction information from the other
tower by regarding them as the input features(i.e., DAT). Later research showed
that the gains achieved by this method are still limited because of lacking the
guidance on the next user intent. To address the aforementioned challenges, we
propose a "cross-interaction decoupling architecture" within our matching
paradigm. This user-tower architecture leverages a diffusion module to
reconstruct the next positive intention representation and employs a
mixed-attention module to facilitate comprehensive cross-interaction. During
the next positive intention generation, we further enhance the accuracy of its
reconstruction by explicitly extracting the temporal drift within user behavior
sequences. Experiments on two real-world datasets and one industrial dataset
demonstrate that our method outperforms the SOTA two-tower models
significantly, and our diffusion approach outperforms other generative models
in reconstructing item representations.