A comprehensive RAG (Retrieval-Augmented Generation) pipeline built with LangGraph, featuring document processing, vector storage, retrieval, and generation with monitoring and safety measures.
User Query → Document Processing → Vector Storage → Retrieval → LLM Generation → Response
↓ ↓ ↓ ↓ ↓ ↓
FastAPI Unstructured ChromaDB LangGraph llama Monitoring
full-rag-pipeline/
├── .github/
│ └── workflows/
│ ├── ci.yml
│ └── cd.yml
├── infra/
│ ├── terraform/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
├── src/
│ ├── core/
│ │ ├── __init__.py
│ │ ├── config.py
│ │ ├── database.py
│ │ └── logging.py
│ ├── data/
│ │ ├── __init__.py
│ │ ├── ingestion.py
│ │ ├── processing.py
│ │ └── synthetic_data.py
│ ├── vectorstore/
│ │ ├── __init__.py
│ │ ├── faiss_store.py
│ │ ├── postgres_store.py
│ │ └── embeddings.py
│ ├── retrieval/
│ │ ├── __init__.py
│ │ ├── retrievers.py
│ │ └── rerankers.py
│ ├── generation/
│ │ ├── __init__.py
│ │ ├── llm_models.py
│ │ └── prompts.py
│ ├── graph/
│ │ ├── __init__.py
│ │ ├── nodes.py
│ │ ├── edges.py
│ │ └── workflow.py
│ ├── safety/
│ │ ├── __init__.py
│ │ ├── content_filter.py
│ │ └── guardrails.py
│ ├── monitoring/
│ │ ├── __init__.py
│ │ ├── metrics.py
│ │ └── logging.py
│ └── api/
│ ├── __init__.py
│ ├── main.py
│ ├── routes.py
│ └── schemas.py
├── tests/
│ ├── __init__.py
│ ├── test_api.py
│ ├── test_graph.py
│ ├── test_retrieval.py
│ └── test_generation.py
├── data/
│ ├── raw/
│ ├── processed/
│ └── synthetic/
├── notebooks/
│ ├── 01_data_exploration.ipynb
│ ├── 02_embedding_evaluation.ipynb
│ └── 03_model_evaluation.ipynb
├── scripts/
│ ├── setup_data.py
│ ├── run_pipeline.py
│ └── evaluate_model.py
├── .env.example
├── .gitignore
├── pyproject.toml
├── README.md
├── Dockerfile
└── docker-compose.yml
- LangGraph: Workflow orchestration and state management
- LangChain: LLM integration and RAG components
- FastAPI: REST API framework
- Pydantic: Data validation and settings management
- Primary: llama (free, local deployment)
- Alternative: OpenAI GPT (with API key)
- Embeddings: sentence-transformers (free) or OpenAI embeddings
- ChromaDB: Local vector storage
- Unstructured: Document parsing and chunking
- PyPDF2: PDF processing
- Prometheus + Grafana: Metrics and monitoring (free)
- LangSmith: LangChain monitoring (free tier)
- Guardrails: Content safety filters
- Docker: Containerization
- Terraform: Infrastructure as Code
- GitHub Actions: CI/CD pipeline
- Poetry: Dependency management
- Support for PDF, DOCX, TXT, and web scraping
- Intelligent chunking with overlap
- Metadata extraction and preservation
- Multiple vector store backends
- Hybrid search (semantic + keyword)
- Reranking for improved relevance
- Multi-step reasoning
- Conditional routing
- Error handling and retries
- State persistence
- Prompt engineering with templates
- Response streaming
- Context window management
- Response validation
- Content filtering
- Response quality scoring
- Performance metrics
- Request/response logging
# Install Poetry
curl -sSL https://install.python-poetry.org | python3 -# Clone repository
git clone https://github.com/SamuelAMT/full-rag-pipeline.git
cd full-rag-pipeline
# Install dependencies
poetry install
# Download and install llama model
huggingface-cli download bartowski/Meta-Llama-3.1-8B-Instruct-GGUF --include "Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf" --local-dir ./models
# Setup environment
cp .env.example .env
# Download sample data
python scripts/setup_data.py
# Start services
docker-compose up -d
# Run the application
poetry run uvicorn src.api.main:app --reload# Health check
curl http://localhost:8000/health
# Upload documents
curl -X POST "http://localhost:8000/documents/upload" \
-H "Content-Type: multipart/form-data" \
-F "[email protected]"
# Query the RAG system
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"query": "What is the main topic of the document?"}'# LLM Configuration
LLAMA_BASE_URL=http://localhost:11434
OPENAI_API_KEY=openai_key_here
# Vector Store
VECTOR_STORE_TYPE=chromadb
# API Configuration
API_HOST=0.0.0.0
API_PORT=8000
DEBUG=true
# Monitoring
LANGSMITH_API_KEY=langsmith_key
LANGSMITH_PROJECT=full-rag-pipeline# Start development environment
poetry shell
docker-compose up -d postgres redis
# Run tests
poetry run pytest
# Run with hot reload
poetry run uvicorn src.api.main:app --reload# Unit tests
poetry run pytest tests/
# Integration tests
poetry run pytest tests/integration/
# Performance tests
poetry run pytest tests/performance/# Build and push Docker image
docker build -t samuelamt/full-rag-pipeline:latest .
docker push samuelamt/full-rag-pipeline:latest
# Deploy with Terraform
cd infra/terraform
terraform init
terraform plan
terraform apply- Precision@K: Relevance of top K retrieved documents
- Recall@K: Coverage of relevant documents in top K
- MRR: Mean Reciprocal Rank
- NDCG: Normalized Discounted Cumulative Gain
- BLEU: N-gram overlap with reference
- ROUGE: Summary quality metrics
- BERTScore: Semantic similarity
- Custom Relevance: Domain-specific scoring
- Request/response times
- Token usage and costs
- Error rates and types
- User satisfaction scores
- Model performance metrics
- High error rates
- Slow response times
- Unusual usage patterns
- Model performance degradation
- Input validation and sanitization
- Output filtering for harmful content
- Rate limiting and abuse detection
- PII detection and redaction
- API key rotation
- Network security groups
- Encrypted data storage
- Audit logging
- Load balancing across API instances
- Vector store sharding
- Async processing queues
- CDN for static assets
- Response caching
- Batch processing
- Model quantization
- Hardware acceleration
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
MIT License - see LICENSE file for details.
- Documentation: README
- Issues: GitHub Issues
- Discussions: GitHub Discussions