A document Q&A assistant that uses RAG (Retrieval-Augmented Generation) to answer questions about PDF documents with page-level citations.
- PDF processing with page-level chunking
- Semantic search using cosine similarity (threshold: 0.3)
- Streaming responses with conversation history
- Precise page citations for transparency
- Python 3.13+
- LM Studio running on
http://localhost:1234
- Models:
text-embedding-nomic-embed-text-v1.5
,qwen/qwen3-4b
# Install dependencies
uv sync
# Or with pip
pip install lmstudio scikit-learn pymupdf
from main import RAG
rag = RAG("document.pdf")
rag.chat("What are the main findings?") # Streams response to console
- Extract - Splits PDF into page-based chunks with metadata
- Embed - Creates embeddings for query and document pages
- Retrieve - Finds top 7 most similar chunks (cosine similarity ≥ 0.3)
- Generate - Uses LM Studio to create contextual responses with page citations