Sovereign Retrieval Augmented Generation for Secure Code Analysis
🚀 Open-source sovereign RAG engine for security code auditing.
- 100% offline
- Runs on personal hardware (Apple M1 fully supported)
- Uses LLMs like phi3, mistral, codellama via Ollama
- Vector DB via ChromaDB
- Preprocessing with spaCy for efficient semantic chunking
- BSD 3-Clause License
Sovereign AI - private, secure, under full control.
This project is part of a broader experiment applying sovereign AI pipelines for security code analysis. It's a fully private, offline, sovereign AI pipeline for security code auditing. Free from cloud lock-in, vendor tracking or corporate surveillance.
Build your own lab. Own your models. Control your data.
Requirements:
- Python 3.10+
- Ollama
- PyMuPDF
- spaCy
- sentence-transformers
- chromadb
- llama-index and related packages (llama-index-llms-ollama, llama-index-vector-stores-chroma, llama-index-embeddings-huggingface)
- colorama (for colored CLI output)
Install dependencies:
pip install -r requirements/requirements.txt
python -m spacy download en_core_web_smFor development, install additional dependencies:
pip install -r requirements/requirements_dev.txtSovereignRAG provides a unified CLI interface with colored output for better readability. There are two main commands:
To ingest security-related PDF documents into the vector database:
python src/cli.py ingest [--pdf-dir PATH_TO_PDF_DIR] [--model MODEL_NAME]Options:
--pdf-dir: Directory containing PDF files to index (default: ./raw_pdfs/)--model: Sentence transformer model to use (default: all-MiniLM-L6-v2)
Example:
python src/cli.py ingest --pdf-dir ./security_pdfs/ --model all-MiniLM-L6-v2To analyze a source code file for security vulnerabilities:
python src/cli.py query --file PATH_TO_SOURCE_FILE [--model MODEL_NAME] [--ollama-url URL]Options:
--fileor-f: Path to the source code file to analyze (required)--modelor-m: Ollama model to use (default: mistral:7b-instruct)--ollama-url: Ollama API URL (default: http://localhost:11434)
Example:
python src/cli.py query --file ./src/app.py --model mistral:7b-instructYou can still use the individual scripts directly:
python src/ingest.py --pdf-dir ./security_pdfs/
python src/query.py --file ./src/app.pyRun the full stack (Python app, Ollama with Phi-3, and persistent ChromaDB) via Docker.
Prerequisites:
- Docker and Docker Compose
Build images and start services:
docker compose build
docker compose up -d ollamaPull a model (e.g., Mistral 7B Instruct). Use exec to run inside the running service:
docker compose exec ollama ollama pull mistral:7b-instructIf the service isn’t ready yet, wait a few seconds or check logs:
docker compose logs -f ollamaNote: If a model tag is not found (e.g., "pull model manifest: file does not exist"), try a known tag like mistral:7b-instruct or phi3:mini, or list available models:
docker compose exec ollama ollama listYou can also browse tags at https://ollama.com/library
Ingest PDFs (mounted at ./raw_pdfs on the host):
docker compose run --rm app python src/cli.py ingest --pdf-dir ./raw_pdfs/ --model all-MiniLM-L6-v2Query code for security analysis. Note: when running inside Docker, point to the Ollama service URL.
# Single file (use --path instead of --file)
docker compose run --rm app python src/cli.py query --path ./src/query.py --model mistral:7b-instruct --ollama-url http://ollama:11434
# Directory + extension filter
docker compose run --rm app python src/cli.py query --path ./src --extension py --model mistral:7b-instruct --ollama-url http://ollama:11434Outputs and data persistence:
- ChromaDB data:
./chroma_db(host) is mounted to/app/chroma_db(container) - Reports:
./output(host) is mounted to/app/output(container) - PDFs:
./raw_pdfs(host) is mounted to/app/raw_pdfs(container)
Stop services:
docker compose downUse the dev image to get pytest, ruff, and other developer tools from requirements/requirements_dev.txt.
Build and open a shell in the dev container:
docker compose build app-dev
docker compose run --rm app-dev bashInside the dev container, run common tasks:
# Format
ruff format .
# Lint
ruff check .
# Tests
pytest -q
# App commands (same as prod), pointing to Ollama service
python src/cli.py ingest --pdf-dir ./raw_pdfs/ --model all-MiniLM-L6-v2
python src/cli.py query --path ./src --extension py --model mistral:7b-instruct --ollama-url http://ollama:11434This project uses ruff for code formatting. To format all Python files in the project, run:
ruff format .This will automatically format your code according to the style defined in the pyproject.toml file.

