InsertRank: LLMs can reason over BM25 scores to Improve Listwise Reranking
Journal:
arXiv
Published Date:
Jun 17, 2025
Abstract
Large Language Models (LLMs) have demonstrated significant strides across
various information retrieval tasks, particularly as rerankers, owing to their
strong generalization and knowledge-transfer capabilities acquired from
extensive pretraining. In parallel, the rise of LLM-based chat interfaces has
raised user expectations, encouraging users to pose more complex queries that
necessitate retrieval by ``reasoning'' over documents rather than through
simple keyword matching or semantic similarity. While some recent efforts have
exploited reasoning abilities of LLMs for reranking such queries, considerable
potential for improvement remains. In that regards, we introduce InsertRank, an
LLM-based reranker that leverages lexical signals like BM25 scores during
reranking to further improve retrieval performance. InsertRank demonstrates
improved retrieval effectiveness on -- BRIGHT, a reasoning benchmark spanning
12 diverse domains, and R2MED, a specialized medical reasoning retrieval
benchmark spanning 8 different tasks. We conduct an exhaustive evaluation and
several ablation studies and demonstrate that InsertRank consistently improves
retrieval effectiveness across multiple families of LLMs, including GPT,
Gemini, and Deepseek models. %In addition, we also conduct ablation studies on
normalization by varying the scale of the BM25 scores, and positional bias by
shuffling the order of the documents. With Deepseek-R1, InsertRank achieves a
score of 37.5 on the BRIGHT benchmark. and 51.1 on the R2MED benchmark,
surpassing previous methods.