ResRank: Unifying Retrieval and Listwise Reranking via End-to-End Joint Training with Residual Passage Compression

TL;DR

ResRank enhances retrieval efficiency and effectiveness via residual passage compression and end-to-end joint training.

cs.IR 🔴 Advanced 2026-04-24 33 views
Xiaojie Ke Shuai Zhang Liansheng Sun Yongjin Wang Hengjun Jiang Xiangkun Liu Cunxin Gu Jian Xu Guanjun Jiang
information retrieval large language models listwise reranking residual connections compressed representation

Key Findings

Methodology

ResRank is a unified retrieval and listwise reranking framework that addresses efficiency and effectiveness bottlenecks caused by long input sequences. It compresses each candidate passage into a single embedding using an Encoder-LLM and replaces traditional autoregressive decoding with a cosine-similarity-based scoring mechanism. A dual-stage, multi-task training strategy simultaneously optimizes the encoder and reranker, ensuring alignment of retrieval and reranking objectives.

Key Results

  • ResRank demonstrates superior performance on TREC Deep Learning and eight BEIR benchmark datasets, achieving higher ranking effectiveness than existing methods without generating any tokens. On BEIR datasets, ResRank in single-pass mode surpasses RankMistral, ListT5-3B, and PE-Rank by over 5 absolute points in average nDCG@10.
  • On TREC Deep Learning 2019 and 2020 benchmarks, ResRank outperforms PE-Rank in single-pass mode and exceeds most distilled LLM rerankers.
  • Ablation studies confirm that residual connections, dual-stage training, end-to-end optimization, and multi-task learning are indispensable to the final performance.

Significance

ResRank's introduction is significant in the field of information retrieval. It addresses the efficiency bottleneck caused by long input sequences while enhancing ranking effectiveness through residual connections and cosine similarity scoring. This approach opens new possibilities for real-time information retrieval in industrial applications, especially in scenarios requiring efficient processing of large candidate passages. Additionally, ResRank's end-to-end joint training strategy offers a new solution for aligning retrieval and reranking objectives.

Technical Contribution

ResRank provides several technical innovations. Firstly, it compresses each candidate passage into a single embedding, reducing input length. Secondly, it adopts a cosine similarity scoring mechanism, eliminating the bottleneck of autoregressive decoding. Finally, through dual-stage, multi-task end-to-end training, ResRank achieves alignment between retrieval and reranking objectives, significantly reducing training complexity.

Novelty

ResRank's novelty lies in applying the compression concept from multimodal large language models to text retrieval and reranking. Compared to previous methods, ResRank is the first to address the misalignment between compressed representation space and ranking space using a residual connection structure, and it eliminates the generation bottleneck through cosine similarity scoring.

Limitations

  • ResRank may still experience information loss when processing extremely long texts, particularly during the compression phase, where some details might be overlooked.
  • Although ResRank performs well in multi-task learning, its training process requires substantial computational resources, which may not be suitable for resource-constrained environments.
  • In certain specific domains or datasets, ResRank's performance might not match that of models optimized specifically for those areas.

Future Work

Future research directions include further optimizing ResRank's compression algorithm to reduce information loss. Additionally, exploring efficient training methods for ResRank in resource-limited environments is essential. Another direction is to apply ResRank to more domains and datasets to verify its generality and adaptability.

AI Executive Summary

In the field of information retrieval, efficiently finding the most relevant content from a large pool of candidate passages has always been a challenge. Traditional large language models (LLMs), while effective, struggle with processing long input sequences efficiently, making them impractical for industrial deployment.

ResRank offers a novel solution to this problem. By compressing each candidate passage into a single embedding and using a cosine similarity scoring mechanism, ResRank maintains efficiency while significantly enhancing ranking effectiveness. Its end-to-end joint training strategy ensures alignment of retrieval and reranking objectives.

Technically, ResRank eliminates the bottleneck of traditional autoregressive decoding through residual passage compression and cosine similarity scoring. This innovation not only improves ranking efficiency but also opens new possibilities for real-time information retrieval.

Experimental results show that ResRank performs exceptionally well on TREC Deep Learning and BEIR benchmark datasets, surpassing many existing methods, especially achieving higher ranking effectiveness without generating any tokens.

However, ResRank may still face information loss issues when processing extremely long texts. Future research could focus on further optimizing its compression algorithm. Additionally, exploring efficient training methods for ResRank in resource-constrained environments is crucial.

Overall, ResRank brings a new perspective to the field of information retrieval, offering innovative technology and significant effectiveness, providing ample space for future research and applications.

Deep Analysis

Background

Modern information retrieval systems typically employ a multi-stage pipeline where a lightweight first-stage retriever rapidly recalls candidate passages from a large corpus, followed by a more sophisticated reranker that refines the ranking order. With the advent of large language models, the reranking stage has witnessed remarkable progress: LLM-based listwise rerankers accept a query together with multiple candidate passages and directly output a permutation, substantially outperforming traditional cross-encoder approaches. However, deploying LLM-based listwise rerankers at scale faces two fundamental challenges: concatenating the full text of dozens or hundreds of candidate passages creates extremely long input sequences, leading to the 'lost in the middle' phenomenon, which directly undermines ranking quality. Even when the input length is manageable, the autoregressive decoding process—generating passage identifiers token by token—adds considerable overhead, especially when ranking long candidate lists.

Core Problem

Traditional large language models face significant efficiency bottlenecks when processing long input sequences. As input length grows, inference latency scales super-linearly, making it impractical for industrial deployment. Additionally, long input sequences lead to the 'lost in the middle' phenomenon, where information buried in the middle of long contexts is disproportionately neglected, directly undermining ranking quality. While sliding window strategies can partially alleviate this issue, the resulting multi-pass inference multiplies latency by a factor proportional to the number of windows, still failing to meet the demands of real-time applications.

Innovation

ResRank's core innovations include:

1. Residual Passage Compression: Inspired by multimodal LLMs, ResRank employs an Encoder-LLM to compress each candidate passage into a single dense embedding, which is directly fed into the Reranker-LLM's input space. A residual connection combines the original encoder embedding with the contextualized hidden state produced by the reranker, reducing learning difficulty and preserving passage-level information.

2. Similarity-Based Scoring: Replacing costly token-by-token autoregressive decoding, ResRank adopts a retrieval-inspired scoring mechanism: the reranker's hidden state at the end-of-sequence position, enriched by cross-passage contextual attention, serves as a global aggregation embedding, which is directly compared with each passage's fused representation via a single-step cosine similarity computation.

3. Dual-Stage, Multi-Task, End-to-End Joint Training: ResRank trains the Encoder-LLM and Reranker-LLM in an end-to-end manner through a two-stage supervised fine-tuning process. Multi-task learning is employed to simultaneously optimize ranking and retrieval objectives, ensuring that the encoder preserves its retrieval capability while adapting to the reranker's requirements.

Methodology

ResRank's methodology details are as follows:

  • �� Passage Compression: Each candidate passage is compressed into a single embedding by the Encoder-LLM, reducing input length.
  • �� Residual Connection: Combines the encoder embedding with the reranker's contextualized hidden state to form the fused passage embedding, preserving passage-level information.
  • �� Similarity-Based Scoring: Uses cosine similarity to compute the relevance score between the global aggregation embedding and each fused passage embedding, eliminating autoregressive decoding.
  • �� Dual-Stage Training: The first stage establishes coarse-grained alignment, while the second stage provides fine-grained refinement, ensuring the encoder's retrieval capability is preserved throughout the entire training process.

Experiments

The experimental design includes evaluations on TREC Deep Learning 2019 and 2020 test sets and eight BEIR benchmark datasets. All models rerank the top-100 passages retrieved by BM25, using nDCG@10 as the primary evaluation metric. Training data is divided into two stages: the first stage trains on 232,419 samples, and the second stage trains on approximately 87,000 high-quality samples. Baseline models include a comprehensive suite of reranking models, covering supervised models, unsupervised LLM-based models, and distillation-trained LLM models.

Results

Experimental results show that ResRank in single-pass mode surpasses RankMistral, ListT5-3B, and PE-Rank by over 5 absolute points in average nDCG@10 on BEIR datasets, particularly excelling on Signal and News datasets. On TREC Deep Learning 2019 and 2020 benchmarks, ResRank outperforms PE-Rank in single-pass mode and exceeds most distilled LLM rerankers. Ablation studies confirm that residual connections, dual-stage training, end-to-end optimization, and multi-task learning are indispensable to the final performance.

Applications

ResRank's application scenarios include:

  • �� Real-Time Information Retrieval: ResRank significantly enhances ranking efficiency and effectiveness in scenarios requiring efficient processing of large candidate passages, suitable for search engines and recommendation systems.
  • �� Industrial Search Engines: By reducing input length and eliminating autoregressive decoding, ResRank offers new possibilities for real-time information retrieval in industrial applications.
  • �� Natural Language Processing Tasks: ResRank's compression algorithm and scoring mechanism can be applied to other NLP tasks requiring efficient processing of long texts, such as text summarization and question answering systems.

Limitations & Outlook

ResRank may still experience information loss when processing extremely long texts, particularly during the compression phase, where some details might be overlooked. Additionally, although ResRank performs well in multi-task learning, its training process requires substantial computational resources, which may not be suitable for resource-constrained environments. In certain specific domains or datasets, ResRank's performance might not match that of models optimized specifically for those areas. Future research could focus on further optimizing ResRank's compression algorithm to reduce information loss and exploring efficient training methods for ResRank in resource-limited environments.

Plain Language Accessible to non-experts

Imagine you're in a huge library, trying to find the book that best fits your current needs. The traditional method is to take out all the possible books and quickly skim through each one to see which one matches your needs. This is like how large language models handle long input sequences: they need to process a lot of information and might miss some important details.

ResRank is like a smart librarian who can quickly scan the cover and summary of each book and then use a special method to distill the essence of each book into a concise summary. This way, when you need to find the best book, you only need to look at these summaries.

What's even better is that this librarian will also consider your specific needs and use these summaries to efficiently decide which book is best for you. This is like ResRank's cosine similarity scoring mechanism, which completely eliminates the tedious process of generating book titles one by one.

In this way, ResRank not only finds the best book faster but also ensures you don't miss any important information. It provides a new, efficient solution for information retrieval.

ELI14 Explained like you're 14

Hey, buddy! Imagine you're playing a super complex game where you need to pick the best tool from a bunch of options to defeat the big boss. The traditional way is to take out all the tools and try each one to see which works best. This is like how large language models handle long inputs: they need to process a lot of information and might miss some important details.

But ResRank is like a super smart assistant who can quickly scan each tool's attributes and then use a special method to distill the essence of each tool into a concise summary. This way, when you need to pick a tool, you only need to look at these summaries.

What's even cooler is that this assistant will also consider your specific needs and use these summaries to efficiently decide which tool is best for you. This is like ResRank's scoring mechanism, which completely eliminates the tedious process of generating tool names one by one.

In this way, ResRank not only finds the best tool faster but also ensures you don't miss any important information. It provides a new, efficient solution for information retrieval.

Glossary

ResRank

ResRank is a unified retrieval and listwise reranking framework that enhances information retrieval efficiency and effectiveness through residual passage compression and end-to-end joint training.

ResRank is used in the paper as the core method to address efficiency and effectiveness bottlenecks caused by long input sequences.

Encoder-LLM

Encoder-LLM is a model used to compress candidate passages into single embeddings, reducing input length.

In ResRank, the Encoder-LLM compresses each candidate passage into a single embedding.

Residual Connection

A residual connection is a method that combines encoder embeddings with reranker contextual hidden states, preserving passage-level information and reducing learning difficulty.

In ResRank, residual connections address the misalignment between compressed representation space and ranking space.

Cosine Similarity

Cosine similarity is a measure used to calculate the similarity between two vectors, ranging from -1 to 1.

In ResRank, cosine similarity is used in the scoring mechanism, replacing traditional autoregressive decoding.

End-to-End Training

End-to-end training is a strategy that simultaneously optimizes multiple model components, ensuring alignment of their objectives.

ResRank uses end-to-end training to optimize both the encoder and the reranker.

nDCG@10

nDCG@10 is a metric used to evaluate the ranking effectiveness of information retrieval systems, considering the relevance and position of results.

In ResRank's experiments, nDCG@10 is used as the primary evaluation metric.

TREC Deep Learning

TREC Deep Learning is a benchmark dataset used to evaluate information retrieval systems, including 2019 and 2020 test sets.

ResRank is evaluated on the TREC Deep Learning benchmark.

BEIR Benchmark

BEIR Benchmark is a benchmark containing datasets from multiple domains, used to evaluate the generalization ability of information retrieval systems.

ResRank is evaluated on eight BEIR benchmark datasets.

BM25

BM25 is a commonly used probabilistic model-based information retrieval algorithm for calculating document-query relevance.

In ResRank's experiments, BM25 is used to retrieve the top-100 candidate passages.

Sliding Window Strategy

The sliding window strategy is a method for handling long input sequences by processing them in segments to reduce input length per step.

In traditional LLMs, the sliding window strategy is used to mitigate efficiency bottlenecks caused by long input sequences.

Open Questions Unanswered questions from this research

  • 1 How can ResRank's compression algorithm be further optimized to reduce information loss? The current compression method may overlook some details when processing extremely long texts, and future research could explore more efficient compression strategies.
  • 2 How can ResRank be efficiently trained in resource-constrained environments? Although ResRank performs well in multi-task learning, its training process requires substantial computational resources.
  • 3 How can ResRank be applied to more domains and datasets? Current experiments focus primarily on TREC Deep Learning and BEIR benchmark datasets, and future research could explore its adaptability in other domains.
  • 4 In certain specific domains or datasets, ResRank's performance might not match that of models optimized specifically for those areas. How can these areas be optimized?
  • 5 How can ResRank's ranking effectiveness be further improved? Although ResRank performs well on existing benchmarks, there is still room for improvement, especially when handling complex queries.

Applications

Immediate Applications

Real-Time Information Retrieval

ResRank significantly enhances ranking efficiency and effectiveness in scenarios requiring efficient processing of large candidate passages, suitable for search engines and recommendation systems.

Industrial Search Engines

By reducing input length and eliminating autoregressive decoding, ResRank offers new possibilities for real-time information retrieval in industrial applications.

Natural Language Processing Tasks

ResRank's compression algorithm and scoring mechanism can be applied to other NLP tasks requiring efficient processing of long texts, such as text summarization and question answering systems.

Long-term Vision

Cross-Domain Information Retrieval

ResRank's generality and adaptability have the potential to be applied in more domains, advancing cross-domain information retrieval.

Intelligent Search Assistant

In the future, ResRank could evolve into an intelligent search assistant, providing personalized search results by combining user needs and contextual information.

Abstract

Large language model (LLM) based listwise reranking has emerged as the dominant paradigm for achieving state-of-the-art ranking effectiveness in information retrieval. However, its reliance on feeding full passage texts into the LLM introduces two critical bottlenecks: the "lost in the middle" phenomenon degrades ranking quality as input length grows, and the inference latency scales super-linearly with sequence length, rendering it impractical for industrial deployment. In this paper, we present ResRank, a unified retrieval-reranking framework that fundamentally addresses both challenges. Inspired by multimodal LLMs that project visual inputs into compact token representations, ResRank employs an Encoder-LLM to compress each candidate passage into a single embedding, which is then fed alongside the query text into a Reranker-LLM for listwise ranking. To alleviate the misalignment between the compressed representation space and the ranking space, we introduce a residual connection structure that combines encoder embeddings with contextualized hidden states from the reranker. Furthermore, we replace the conventional autoregressive decoding with a one-step cosine-similarity-based scoring mechanism, eliminating the generation bottleneck entirely. ResRank is trained through a carefully designed dual-stage, multi-task, end-to-end joint optimization strategy that simultaneously trains the encoder and reranker, achieving learning objective alignment between retrieval and reranking while substantially reducing training complexity. Extensive experiments on TREC Deep Learning and eight BEIR benchmark datasets demonstrate that ResRank achieves competitive or superior ranking effectiveness compared to existing approaches while requiring zero generated tokens and processing only one token per passage, yielding a fundamentally better balance between effectiveness and efficiency.

cs.IR cs.AI

References (20)

E2Rank: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker

Qi Liu, Yanzhao Zhang, Mingxin Li et al.

2025 4 citations ⭐ Influential View Analysis →

Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models

Qi Liu, Bo Wang, Nan Wang et al.

2024 26 citations ⭐ Influential View Analysis →

Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent

Weiwei Sun, Lingyong Yan, Xinyu Ma et al.

2023 489 citations ⭐ Influential View Analysis →

HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling

Junyi Chen, Lu Chi, Bingyue Peng et al.

2024 103 citations ⭐ Influential View Analysis →

Visual Instruction Tuning

Haotian Liu, Chunyuan Li, Qingyang Wu et al.

2023 9067 citations ⭐ Influential View Analysis →

DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters

Jeff Rasley, Samyam Rajbhandari, Olatunji Ruwase et al.

2020 1990 citations

Learning to rank using gradient descent

C. Burges, T. Shaked, Erin Renshaw et al.

2005 3157 citations

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

Nandan Thakur, Nils Reimers, Andreas Ruckl'e et al.

2021 1598 citations View Analysis →

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Tri Dao, Daniel Y. Fu, Stefano Ermon et al.

2022 4092 citations View Analysis →

Optimizing Generative Ranking Relevance via Reinforcement Learning in Xiaohongshu Search

Ziyang Zeng, Heming Jing, Jindong Chen et al.

2025 5 citations View Analysis →

Reciprocal rank fusion outperforms condorcet and individual rank learning methods

G. Cormack, C. Clarke, Stefan Büttcher

2009 867 citations

OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment

Jiaxin Deng, Shiyao Wang, Kuo Cai et al.

2025 181 citations View Analysis →

DiffuRank: Effective Document Reranking with Diffusion Language Models

Qi Liu, Kun Ai, Jiaxin Mao et al.

2026 1 citations View Analysis →

RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models

Ronak Pradeep, Sahel Sharifymoghaddam, Jimmy Lin

2023 77 citations View Analysis →

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang et al.

2025 4486 citations View Analysis →

Large Search Model: Redefining Search Stack in the Era of LLMs

Liang Wang, Nan Yang, Xiaolong Huang et al.

2023 31 citations View Analysis →

Document Ranking with a Pretrained Sequence-to-Sequence Model

Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep et al.

2020 756 citations View Analysis →

CompLLM: Compression for Long Context Q&A

G. Berton, Jayakrishnan Unnikrishnan, Son Tran et al.

2025 2 citations View Analysis →

Large Language Models for Information Retrieval: A Survey

Yutao Zhu, Huaying Yuan, Shuting Wang et al.

2023 537 citations View Analysis →

Multi-Stage Document Ranking with BERT

Rodrigo Nogueira, Wei Yang, Kyunghyun Cho et al.

2019 481 citations View Analysis →