This project implements a RAG system enhanced with Grover's algorithm for intelligent context selection. Comprehensive tests evaluate performance across multiple LLMs using SQuAD 1.1 benchmark data.
Objective: Assess impact of:
- Grover vs. classic context selection
- LLM models (
llama-3-8b,mixtral-8x7b,phi-3.5) - Context variants (
no_context,top1,top3)
Parameters:
- 63 SQuAD questions
- Metrics: Word overlap and cosine similarity
Key Insights:
- 3-context setups outperform:
llama-3-8bachieved highest cosine similarity (0.80) - Grover ≈ Classic: Quality differences <0.5%
- No-context fails: All models showed significant quality drop
Parameters:
- 56 SQuAD questions
- End-to-end latency measurement
Results:
| Component | Time |
|---|---|
| Context retrieval (top-10) | 0.297 s |
| Grover selection (top-3) | 0.030 s |
| Answer generation: | |
mixtral-8x7b |
1.18 s 🚀 |
llama-3-8b |
2.56 s |
phi-3.5 |
2.65 s |
- 99.11% match between Grover/classic selection
- Single discrepancy caused by Grover's dynamic threshold
- Confirmed Test 1 trends:
top3>top1>no_context llama-3-8bdominated quality metrics (cosine: 0.80)

-
Grover is efficient:
- Adds only 30ms latency vs classic selection
- Maintains context selection quality
-
Context is critical:
- 3 contexts boost accuracy by 38% vs no context
-
Model tradeoffs:
mixtral-8x7b: Fastest inference (1.18s)llama-3-8b: Highest accuracy with contexts
Interactive interface features:
- Multi-model comparisons (
llama,mixtral,phi-3.5) - Context inspection (Grover vs classic)
- Collapsible answers
| Screen | Preview |
|---|---|
| Home | ![]() |
| Answers | ![]() |
| Top Contexts | ![]() |



