RADAR: Recall Augmentation through Deferred Asynchronous Retrieval
Journal:
arXiv
Published Date:
Jun 8, 2025
Abstract
Modern large-scale recommender systems employ multi-stage ranking funnel
(Retrieval, Pre-ranking, Ranking) to balance engagement and computational
constraints (latency, CPU). However, the initial retrieval stage, often relying
on efficient but less precise methods like K-Nearest Neighbors (KNN), struggles
to effectively surface the most engaging items from billion-scale catalogs,
particularly distinguishing highly relevant and engaging candidates from merely
relevant ones. We introduce Recall Augmentation through Deferred Asynchronous
Retrieval (RADAR), a novel framework that leverages asynchronous, offline
computation to pre-rank a significantly larger candidate set for users using
the full complexity ranking model. These top-ranked items are stored and
utilized as a high-quality retrieval source during online inference, bypassing
online retrieval and pre-ranking stages for these candidates. We demonstrate
through offline experiments that RADAR significantly boosts recall (2X
Recall@200 vs DNN retrieval baseline) by effectively combining a larger
retrieved candidate set with a more powerful ranking model. Online A/B tests
confirm a +0.8% lift in topline engagement metrics, validating RADAR as a
practical and effective method to improve recommendation quality under strict
online serving constraints.