- [2025-10] π Released DeAR-8B and DeAR-3B rerankers (both pointwise and listwise variants)
- [2025-10] π Released DeAR Chain-of-Thought (CoT) dataset
- [2025-10] π Released teacher model LLaMA2-13B-RankLLaMA
- [2025-08] π Paper accepted at EMNLP Findings 2025
- [2025-08] π DeAR achieves 90.97 nDCG@10 on NovelEval, outperforming GPT-4 by +3.09!
- [2025-08] β‘ Inference speedup: 2.2s pointwise, 11.16s listwise
DeAR (Distillation Enhanced Adaptive Reranking) is a suite of high-performance neural rerankers designed to improve information retrieval through knowledge distillation and advanced training techniques. Our models are available in multiple sizes (3B and 8B parameters) and training paradigms (pointwise and listwise), offering flexibility for various deployment scenarios.
- π― State-of-the-art Performance: Competitive reranking results on standard benchmarks
- π Multiple Training Paradigms: Both pointwise (Binary Cross-Entropy, RankNet) and listwise approaches
- π¦ Flexible Deployment: Full models and efficient LoRA adapters
- π§ Knowledge Distillation: Leveraging Chain-of-Thought reasoning from teacher models
- π Easy Integration: Simple APIs for both single document scoring and batch reranking
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load model
model_path = "abdoelsayed/dear-8b-reranker-ce-v1"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path, torch_dtype=torch.bfloat16)
model.eval().cuda()
# Score a query-document pair
query = "What is llama?"
doc = "The llama is a domesticated South American camelid..."
inputs = tokenizer(f"query: {query}", f"document: {doc}", return_tensors="pt", truncation=True, max_length=228)
inputs = {k: v.cuda() for k, v in inputs.items()}
with torch.no_grad():
score = model(**inputs).logits.squeeze().item()
print(f"Relevance score: {score}")| Model | Size | Type | Link |
|---|---|---|---|
| LLaMA2-RankLLaMA-Teacher | 13B | Teacher | π€ Hub |
| Model | Size | Training | Full/LoRA | Link |
|---|---|---|---|---|
| DeAR-8B-Listwise | 8B | Listwise | Full | π€ Hub |
| DeAR-8B-Listwise-LoRA | 8B | Listwise | LoRA | π€ Hub |
| Model | Size | Loss Function | Full/LoRA | Link |
|---|---|---|---|---|
| DeAR-8B-RankNet | 8B | RankNet | Full | π€ Hub |
| DeAR-8B-RankNet-LoRA | 8B | RankNet | LoRA | π€ Hub |
| DeAR-8B-CE | 8B | Binary Cross-Entropy | Full | π€ Hub |
| DeAR-8B-CE-LoRA | 8B | Binary Cross-Entropy | LoRA | π€ Hub |
| Model | Size | Loss Function | Full/LoRA | Link |
|---|---|---|---|---|
| DeAR-3B-RankNet | 3B | RankNet | Full | π€ Hub |
| DeAR-3B-RankNet-LoRA | 3B | RankNet | LoRA | π€ Hub |
| DeAR-3B-CE | 3B | Binary Cross-Entropy | Full | π€ Hub |
| DeAR-3B-CE-LoRA | 3B | Binary Cross-Entropy | LoRA | π€ Hub |
| Resource | Description | Link |
|---|---|---|
| DeAR-COT Dataset | Chain-of-Thought reasoning data for reranker training | π€ Hub |
rankify[all]
pyserini==0.22.1pip install torch transformers peft accelerate
pip install pyserini==0.22.1git clone https://github.com/DataScienceUIBK/DeAR-Reranking.git
cd DeAR-Reranking
pip install -r requirements.txtimport torch
from typing import List, Tuple
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel, PeftConfig
DTYPE = {
"auto": None,
"float16": torch.float16,
"bfloat16": torch.bfloat16,
"float32": torch.float32,
}
def load_ranker(model_or_adapter_path: str, torch_dtype: str = "auto", device: str = "auto"):
"""
Load either a PEFT LoRA adapter or a merged/original model.
Args:
model_or_adapter_path: Path to PEFT adapter dir/HF repo OR merged model dir/HF repo
torch_dtype: Data type for model weights ("auto", "float16", "bfloat16", "float32")
device: Device to load model on ("auto", "cuda", "cpu")
Returns:
tokenizer, model, device_str
"""
dtype = DTYPE[torch_dtype]
# Try to interpret as a PEFT adapter first
is_peft = False
peft_cfg = None
try:
peft_cfg = PeftConfig.from_pretrained(model_or_adapter_path)
is_peft = True
except Exception:
is_peft = False
if is_peft:
# Load LoRA adapter
base_id = peft_cfg.base_model_name_or_path
tok = AutoTokenizer.from_pretrained(base_id)
if tok.pad_token is None:
tok.pad_token = tok.eos_token
tok.pad_token_id = tok.eos_token_id
tok.padding_side = "right"
base = AutoModelForSequenceClassification.from_pretrained(
base_id, num_labels=1, torch_dtype=dtype
)
model = PeftModel.from_pretrained(base, model_or_adapter_path)
model = model.merge_and_unload()
else:
# Load merged/original model
tok = AutoTokenizer.from_pretrained(model_or_adapter_path)
if tok.pad_token is None:
tok.pad_token = tok.eos_token
tok.pad_token_id = tok.eos_token_id
tok.padding_side = "right"
model = AutoModelForSequenceClassification.from_pretrained(
model_or_adapter_path, torch_dtype=dtype
)
model.eval()
# Configure tokenizer IDs for LLaMA-style models
if getattr(model.config, "pad_token_id", None) is None:
model.config.pad_token_id = tok.pad_token_id
if getattr(model.config, "bos_token_id", None) is None and tok.bos_token_id is not None:
model.config.bos_token_id = tok.bos_token_id
if getattr(model.config, "eos_token_id", None) is None and tok.eos_token_id is not None:
model.config.eos_token_id = tok.eos_token_id
if device == "auto":
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
return tok, model, device
@torch.inference_mode()
def score_pair(tokenizer, model, device, query: str, passage: str, title: str = "",
q_max_len: int = 32, p_max_len: int = 196) -> float:
"""Score a single query-document pair."""
inputs = tokenizer(
f"query: {query}",
f"document: {title} {passage}",
return_tensors="pt",
truncation=True,
max_length=q_max_len + p_max_len,
padding="max_length",
return_attention_mask=True,
)
inputs = {k: v.to(device) for k, v in inputs.items()}
logits = model(**inputs).logits
return float(logits.squeeze().item())
@torch.inference_mode()
def rerank(tokenizer, model, device, query: str, docs: List[Tuple[str, str]],
q_max_len: int = 32, p_max_len: int = 196, batch_size: int = 64):
"""
Rerank a list of documents for a given query.
Args:
docs: List of (title, passage) tuples
Returns:
List of (index, score) tuples sorted by score (descending)
"""
scores = []
for i in range(0, len(docs), batch_size):
chunk = docs[i:i + batch_size]
q_texts = [f"query: {query}"] * len(chunk)
d_texts = [f"document: {t} {p}" for t, p in chunk]
inputs = tokenizer(
q_texts, d_texts,
return_tensors="pt",
truncation=True,
padding=True,
max_length=q_max_len + p_max_len,
return_attention_mask=True,
)
inputs = {k: v.to(device) for k, v in inputs.items()}
logits = model(**inputs).logits.squeeze(-1)
scores.extend(logits.detach().cpu().tolist())
ranked = sorted(enumerate(scores), key=lambda x: x[1], reverse=True)
return ranked# Load model (either full model or LoRA adapter)
model_path = "abdoelsayed/dear-8b-reranker-ce-v1"
tokenizer, model, device = load_ranker(model_path, torch_dtype="bfloat16")
# Score a single query-document pair
query = "What is llama?"
title = "Llama"
passage = "The llama is a domesticated South American camelid..."
score = score_pair(tokenizer, model, device, query, passage, title)
print(f"Relevance score: {score}")# Rerank multiple documents for a query
query = "When did Thomas Edison invent the light bulb?"
docs = [
("", "Lightning strike at Seoul National University"),
("", "Thomas Edison tried to invent a device for car but failed"),
("", "Coffee is good for diet"),
("", "KEPCO fixes light problems"),
("", "Thomas Edison invented the light bulb in 1879"),
]
ranking = rerank(tokenizer, model, device, query, docs)
print(ranking)
# Expected output (example for DeAR-8B-CE):
# [(4, -2.015625), (1, -5.6875), (2, -6.375), (0, -6.5), (3, -6.78125)]Listwise rerankers process multiple documents simultaneously and generate rankings through language generation.
import torch
from transformers import AutoTokenizer
from peft import AutoPeftModelForCausalLM
adapter_repo = "abdoelsayed/dear-8b-reranker-listwise-lora-v1"
dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
tokenizer = AutoTokenizer.from_pretrained(adapter_repo, use_fast=True, trust_remote_code=True)
# Load model with device map (recommended)
model = AutoPeftModelForCausalLM.from_pretrained(
adapter_repo,
torch_dtype=dtype,
device_map="auto", # Automatically distributes model across available GPUs
trust_remote_code=True,
low_cpu_mem_usage=True
)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_tokendef create_listwise_prompt(query: str, documents: List[str], max_length: int = 300) -> str:
"""Create prompt for listwise reranking."""
doc_list = "\n".join([f"[{i}] {doc[:max_length]}" for i, doc in enumerate(documents)])
prompt = f"""I will provide you with {len(documents)} passages, each indicated by a number identifier [].
Rank the passages based on their relevance to the search query: {query}.
{doc_list}
Search Query: {query}.
Rank the passages above based on their relevance to the search query. Output the ranking as a list of numbers."""
return prompt
# Example usage
query = "When did Thomas Edison invent the light bulb?"
documents = [
"Lightning strike at Seoul National University",
"Thomas Edison tried to invent a device for car but failed",
"Coffee is good for diet",
"KEPCO fixes light problems",
"Thomas Edison invented the light bulb in 1879",
]
prompt = create_listwise_prompt(query, documents)
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048)
inputs = {k: v.to(model.device) for k, v in inputs.items()}
# Generate ranking
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=50,
temperature=0.7,
do_sample=False,
pad_token_id=tokenizer.pad_token_id
)
ranking_text = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(f"Ranking: {ranking_text}")
# Expected output: [4] > [1] > [0] > [3] > [2]The DeAR models are trained on the DeAR-COT dataset, which contains Chain-of-Thought reasoning annotations from the GPT-4o.
Train pointwise rerankers with knowledge distillation from the teacher model:
# Example training command with DeepSpeed
deepspeed --include localhost:0,1,2,3 reranker_train.py \
--deepspeed ./config/ds_config.json \
--output_dir ./models/llama8 \
--model_name_or_path meta-llama/Llama-3.1-8B \
--teacher_model_name_or_path abdoelsayed/llama2-13b-rankllama-teacher \
--temperature 2 \
--alpha 0.1 \
--save_steps 200 \
--dataset_name Tevatron/msmarco-passage \
--bf16 \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 2 \
--train_n_passages 2 \
--learning_rate 1e-4 \
--q_max_len 32 \
--p_max_len 196 \
--num_train_epochs 2 \
--logging_steps 10 \
--overwrite_output_dir \
--dataset_proc_num 32Or use the training script:
bash train_pointwise.shWe train listwise CoT models using LLaMA-Factory for multi-node/multi-GPU training via torch.distributed.run.
Add DeAR-CoT into LLaMA-Factory's data/ directory and update the dataset config file.
Using our helper script:
bash train_listwise.shOr configure your own LLaMA-Factory training command. We use torch.distributed.run for multi-node/multi-GPU distributed training.
We provide pre-computed BM25 outputs for reproducible second-stage reranking evaluation. Download and extract to the data/ directory:
# Download BM25 outputs for BEIR and DL19/20
wget -L 'https://www.dropbox.com/scl/fi/2ryyzvht45fazrjjuetgk/bm25_beir_dl19_20.zip?rlkey=e3li5e26n12iuq2zrp61ti5tq&st=xnic6wvp&dl=1' -O bm25_beir_dl19_20.zip
# Extract to data directory
unzip bm25_beir_dl19_20.zip -d data
# Final path structure: data/bm25_beir_dl19_20/...Evaluate pointwise models on BEIR and TREC DL19/20:
bash run_pointwise.shEvaluate listwise models on BEIR and TREC DL19/20:
bash run_listwise.sh| Method | DL19 | DL20 | Avg |
|---|---|---|---|
| BM25 | 50.58 | 47.96 | 49.27 |
| MonoT5-3B | 71.83 | 68.89 | 70.36 |
| RankGPT-4 | 75.59 | 70.56 | 73.08 |
| DeAR-L-8B | 77.91 | 75.63 | 76.77 |
| Method | Covid | NFCorpus | Touche | DBPedia | SciFact | News | Robust04 | Signal |
|---|---|---|---|---|---|---|---|---|
| BM25 | 59.47 | 30.75 | 44.22 | 31.80 | 67.89 | 39.52 | 40.70 | 33.05 |
| MonoT5-3B | 80.71 | 38.97 | 32.41 | 44.45 | 76.57 | 48.49 | 56.71 | 32.55 |
| DeAR-L-8B | 88.36 | 40.56 | 37.23 | 47.12 | 74.95 | 52.89 | 62.18 | 34.40 |
| Method | nDCG@1 | nDCG@5 | nDCG@10 | Avg |
|---|---|---|---|---|
| BM25 | 33.33 | 45.96 | 55.77 | 45.02 |
| RankGPT-4 | 85.71 | 87.49 | 90.45 | 87.88 |
| DeAR-L-8B | 92.86 | 88.04 | 92.01 | 90.97 |
| Method | nDCG@10 | Time (s) | Speed Rank |
|---|---|---|---|
| DeAR-P-8B | 74.5 | 2.2 | π₯ |
| DeAR-L-8B | 75.54 | 11.16 | β‘ |
| RankZephyr | 74.2 | 21.58 | π |
| RankVicuna | 66.82 | 17.86 | π |
DeAR rerankers are built on top of large language models (LLaMA-2-3B/8B) with the following modifications:
- Pointwise Models: Add a classification head on top of the base model for scoring individual query-document pairs
- Listwise Models: Use the generative capabilities of LLMs to produce rankings through text generation
- Knowledge Distillation: Leverage Chain-of-Thought reasoning from a 13B teacher model (LLaMA2-RankLLaMA)
The DeAR-COT dataset contains:
- Query-document pairs with relevance annotations
- Chain-of-Thought reasoning from the teacher model
- Training data for both pointwise and listwise rerankers
from datasets import load_dataset
# Load dataset
dataset = load_dataset("abdoelsayed/DeAR-COT")
# Example data point
print(dataset['train'][0])We welcome contributions! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
If you use DeAR in your research, please cite our paper:
@article{abdallah2025dear,
title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation},
author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam},
journal={arXiv preprint arXiv:2508.16998},
year={2025}
}We thank the following projects and their contributors:
- Rankify A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation
- LLaMA for the base models
- PEFT for efficient fine-tuning
- Transformers for the modeling infrastructure
- RankLLaMA for inspiring our teacher model design
- Tevatron Unified Document Retrieval Toolkit across Scale, Language, and Modality.
Made with β€οΈ by the DISC Team
