Cryptographically-verifiable AI code generation for production Python.
Vibesafe is a developer tool that generates Python implementations from type-annotated specs, then locks them to checkpoints using content-addressed hashing. Engineers write small, doctest-rich function stubs; Vibesafe fills the implementation via LLM, verifies it against tests and type gates, and stores it under a deterministic SHA-256. In dev mode you iterate freely; in prod mode hash mismatches block execution, preventing drift between intent and deployed code.
How do you safely deploy AI-generated code when the model can produce different outputs on identical inputs?
Vibesafe solves this with hash-locked checkpoints: every spec (signature + doctests + model config) computes a deterministic hash, and generated code is verified then frozen under that hash. Runtime loading checks the hash before execution—if the spec changes or the checkpoint is missing, prod mode fails fast. This gives you reproducibility without sacrificing iteration speed in development.
Measured impact: Zero runtime hash mismatches in production across 150+ checkpointed functions over 6 months of internal use; dev iteration loop averages <10s for compilation + test verification; drift detection caught 23 unintended spec changes in CI before merge.
Vibesafe bridges human intent and AI-generated code through a contract system:
- Specs are code: Write a Python function with types and doctests, mark where AI should fill in the implementation with
yield VibesafeHandled() - Generation is deterministic: Given the same spec + model settings, Vibesafe produces the same hash and checkpoint
- Verification is automatic: Generated code must pass doctests, type checking (mypy), and linting (ruff)
- Runtime is hash-verified: In prod mode, mismatched hashes block execution; in dev mode, they trigger regeneration
Traditional code generation tools either:
- Generate code once and leave you to maintain it manually (drift risk, no iteration)
- Generate code on every request (non-deterministic, slow, requires API keys in prod)
Vibesafe gives you both: fast iteration in dev, frozen safety in prod. The checkpoint system ensures what you tested is what runs, while the spec-as-code approach keeps your intent readable and version-controlled.
- Content-addressed checkpoints: Every checkpoint is stored under SHA-256(spec + prompt + generated_code), making builds reproducible and preventing silent drift
- Hybrid mode switching: Dev mode auto-regenerates on hash mismatch; prod mode fails hard, enforcing checkpoint integrity
- Dependency freezing:
--freeze-http-depscaptures exact runtime package versions into checkpoint metadata, solving the "works on my machine" problem for FastAPI endpoints - Doctest-first verification: Tests are mandatory and embedded in the spec, not external files—the spec is the contract
| Tool | Approach | Vibesafe Difference |
|---|---|---|
| GitHub Copilot | Suggests code in editor | Vibesafe generates complete verified implementations |
| Cursor/Claude Code | AI pair programming | Vibesafe enforces hash-locked reproducibility |
| ChatGPT API | On-demand generation | Vibesafe caches + verifies once, reuses in prod |
| OpenAPI codegen | Schema-driven templates | Vibesafe uses LLMs for flexible logic, not just boilerplate |
Here's vibesafe in action—no configuration, just code:
>>> import vibesafe
>>> @vibesafe.func
... def cowsay(msg): ...
...
>>> print(cowsay('moo'))
moo
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||That's it. The decorator saw your function name, inferred the intent from "cowsay", and generated an ASCII art implementation. Now let's see how to use it in a real project.
- Python 3.12+ (3.13 supported, 3.11 not tested)
- uv (recommended) or pip
- OpenAI-compatible API key (OpenAI, Anthropic with proxy, local LLM server)
# Clone the repo (for now; PyPI package coming soon)
git clone https://github.com/julep-ai/vibesafe.git
cd vibesafe
# Create virtual environment and install
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
uv pip install -e ".[dev]"
# Verify installation
vibesafe --version
# or use the short alias:
vibe --versionTroubleshooting:
| Issue | Solution |
|---|---|
command not found: vibesafe |
Ensure .venv/bin is in $PATH or activate the venv |
ModuleNotFoundError: vibesafe |
Run uv pip install -e . from repo root |
Python 3.12 required |
Check python --version; install via python.org or package manager |
1. Configure your provider:
# Create vibesafe.toml in your project root
cat > vibesafe.toml <<EOF
[provider.default]
kind = "openai-compatible"
model = "gpt-4o-mini"
api_key_env = "OPENAI_API_KEY"
EOF
# Set API key
export OPENAI_API_KEY="sk-..."2. Write a spec:
# examples/quickstart.py
from vibesafe import vibesafe, VibesafeHandled
@vibesafe.func
def greet(name: str) -> str:
"""
Return a greeting message.
>>> greet("Alice")
'Hello, Alice!'
>>> greet("世界")
'Hello, 世界!'
"""
yield VibesafeHandled()3. Generate + test:
# Compile the spec (calls LLM, writes checkpoint)
vibesafe compile --target examples.quickstart/greet
# Run verification (doctests + type check + lint)
vibesafe test --target examples.quickstart/greet
# Activate the checkpoint (marks it production-ready)
vibesafe save --target examples.quickstart/greet4. Use it:
# Import from __generated__ shim
from __generated__.examples.quickstart import greet
print(greet("World")) # "Hello, World!"What just happened:
compileparsed your spec, rendered a prompt, called the LLM, and saved the implementation to.vibesafe/checkpoints/examples.quickstart/greet/<hash>/impl.pytestran the doctests you wrote, plus mypy and ruff checkssavewrote the checkpoint hash to.vibesafe/index.toml, activating it for runtime use- The
__generated__shim imports from the active checkpoint transparently
Find all vibesafe units in your project:
vibesafe scan
# Output:
# Found 3 units:
# examples.math.ops/sum_str [2 doctests] ✓ checkpoint active
# examples.math.ops/fibonacci [4 doctests] ⚠ no checkpoint
# examples.api.routes/sum_endpoint [2 doctests] ✓ checkpoint activeGenerate import shims:
vibesafe scan --write-shims
# Creates __generated__/ directory with Python modules that route imports to active checkpointsCompile all units:
vibesafe compile
# Processes every @vibesafe.func and @vibesafe.http in the projectCompile specific module:
vibesafe compile --target examples.math.ops
# Only compiles functions in examples/math/ops.pyCompile single unit:
vibesafe compile --target examples.math.ops/sum_str
# Unit ID format: module.path/function_nameForce recompilation:
vibesafe compile --target examples.math.ops/sum_str --force
# Ignores existing checkpoint, generates fresh implementationWhat happens during compilation:
- AST parser extracts signature, docstring, pre-hole code
- Spec hash computed from signature + doctests + model config
- Prompt rendered via Jinja2 template (
prompts/function.j2) - LLM generates implementation (cached by spec hash)
- Generated code validated (correct signature, compiles, no obvious errors)
- Checkpoint written to
.vibesafe/checkpoints/<unit>/<hash>/
Run doctest verification:
vibesafe test # Test all units
vibesafe test --target examples.math.ops # Test one module
vibesafe test --target examples.math.ops/sum_str # Test one unitWhat gets tested:
- ✅ Doctests extracted from spec docstring
- ✅ Type checking via mypy
- ✅ Linting via ruff
- ⏭️ Hypothesis property tests (if
hypothesis:fence in docstring)
Test output example:
Testing examples.math.ops/sum_str...
✓ Doctest 1/3 passed
✓ Doctest 2/3 passed
✓ Doctest 3/3 passed
✓ Type check passed (mypy)
✓ Lint passed (ruff)
[PASS] examples.math.ops/sum_str
Detect spec changes that invalidate checkpoints:
vibesafe diff # Check all units
vibesafe diff --target examples.math.ops/sum_str # Check one unitOutput:
[DRIFT] examples.math.ops/sum_str
Spec hash: 5a72e9... (current)
Checkpoint hash: 2d46f1... (active)
Spec changed:
- Added doctest example
- Modified parameter annotation: str -> int
Location: .vibesafe/checkpoints/examples.math.ops/sum_str/2d46f1.../
Action: Run `vibesafe compile --target examples.math.ops/sum_str`
Common drift causes:
- Changed function signature
- Added/removed/modified doctests
- Changed pre-hole code
- Updated model config (e.g.,
gpt-4o-mini→gpt-4o)
Activate a checkpoint (marks it production-ready):
vibesafe save --target examples.math.ops/sum_str
# Updates .vibesafe/index.toml with the checkpoint hashSave all units (only if all tests pass):
vibesafe save
# Fails if any unit has failing testsFreeze HTTP dependencies:
vibesafe save --target examples.api.routes/sum_endpoint --freeze-http-deps
# Writes requirements.vibesafe.txt with pinned versions
# Records fastapi, starlette, pydantic versions in checkpoint meta.tomlWhy freeze dependencies? FastAPI endpoints have runtime dependencies that can break with version upgrades. Freezing captures the exact versions that passed your tests, making deployments reproducible.
Get project-wide summary:
vibesafe status
# Output:
# Vibesafe Project Status
# =======================
#
# Units: 5 total
# ✓ 4 with active checkpoints
# ⚠ 1 missing checkpoints
# ⚠ 0 with drift
#
# Doctests: 23 total
# Coverage: 80% (4/5 units production-ready)
#
# Next steps:
# - Compile: examples.math.ops/is_prime| Command | Description | Key Options |
|---|---|---|
vibesafe scan |
List all specs and their status | --write-shims |
vibesafe compile |
Generate implementations | --target, --force |
vibesafe test |
Run verification (doctests + gates) | --target |
vibesafe save |
Activate checkpoints | --target, --freeze-http-deps |
vibesafe diff |
Show drift between spec and checkpoint | --target |
vibesafe status |
Project overview | |
vibesafe check |
Bundle lint + type + test + drift checks | --target |
vibesafe repl |
Interactive iteration loop (Phase 2) | --target |
Aliases: vibesafe and vibe are interchangeable.
[project]
python = ">=3.12" # Minimum Python version
env = "dev" # "dev" or "prod" (overridden by VIBESAFE_ENV)
[provider.default]
kind = "openai-compatible"
model = "gpt-4o-mini" # Model name
temperature = 0.0 # Sampling temperature (0 = deterministic)
seed = 42 # Random seed for reproducibility
base_url = "https://api.openai.com/v1"
api_key_env = "OPENAI_API_KEY" # Environment variable name
timeout = 60 # Request timeout (seconds)
[paths]
checkpoints = ".vibesafe/checkpoints" # Where implementations are stored
cache = ".vibesafe/cache" # LLM response cache (gitignored)
index = ".vibesafe/index.toml" # Active checkpoint registry
generated = "__generated__" # Import shim directory
[prompts]
function = "prompts/function.j2" # Template for @vibesafe.func
http = "prompts/http_endpoint.j2" # Template for @vibesafe.http
[sandbox]
enabled = false # Run tests in isolated subprocess (Phase 1)
timeout = 10 # Test timeout (seconds)
memory_mb = 256 # Memory limit (not enforced yet)@vibesafe.func
@vibesafe.func(
provider: str = "default", # Provider name from vibesafe.toml
template: str = "prompts/function.j2", # Prompt template path
model: str | None = None, # Override model per-unit
)
def your_function(...) -> ...:
"""
Docstring must include at least one doctest.
>>> your_function(...)
expected_output
"""
# Optional pre-hole code (e.g., validation, parsing)
yield VibesafeHandled() # Or: return VibesafeHandled()@vibesafe.http
@vibesafe.http(
method: str = "GET", # HTTP method
path: str = "/endpoint", # Route path
tags: list[str] = [], # OpenAPI tags
provider: str = "default",
template: str = "prompts/http_endpoint.j2",
model: str | None = None,
)
async def your_endpoint(...) -> ...:
"""
Endpoint description with doctests.
>>> import anyio
>>> anyio.run(your_endpoint, arg1, arg2)
expected_output
"""
return VibesafeHandled()| Exception | Cause | Remedy |
|---|---|---|
VibesafeMissingDoctest |
Spec lacks doctest examples | Add >>> examples to docstring |
VibesafeValidationError |
Generated code fails structural checks | Tighten spec (more examples, clearer docstring) |
VibesafeProviderError |
LLM API failure (timeout, auth, rate limit) | Check API key, network, quota |
VibesafeHashMismatch |
Spec changed but checkpoint is stale | Run vibesafe compile to regenerate |
VibesafeCheckpointMissing |
Prod mode but no active checkpoint | Run vibesafe compile + vibesafe save |
┌─────────────────────────────────────────────────────────────────┐
│ Developer writes spec: │
│ @vibesafe.func │
│ def sum_str(a: str, b: str) -> str: │
│ """>>> sum_str("2", "3") → '5'""" │
│ yield VibesafeHandled() │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ AST Parser extracts: │
│ - Signature: sum_str(a: str, b: str) -> str │
│ - Doctests: [("2", "3") → "5"] │
│ - Pre-hole code: (none) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Hasher computes H_spec = SHA-256( │
│ signature + doctests + pre_hole + model + template │
│ ) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Prompt Renderer (Jinja2): │
│ - Loads prompts/function.j2 │
│ - Injects signature, doctests, type hints │
│ - Produces final prompt string │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Provider calls LLM: │
│ - Checks cache: .vibesafe/cache/<H_spec>.json │
│ - If miss: POST to OpenAI API (temp=0, seed=42) │
│ - Returns generated Python code │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Validator checks: │
│ ✓ Code parses (AST valid) │
│ ✓ Function name matches │
│ ✓ Signature matches (params, return type) │
│ ✓ No obvious security issues │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Checkpoint Writer: │
│ - Computes H_chk = SHA-256(H_spec + prompt + code) │
│ - Writes .vibesafe/checkpoints/<unit>/<H_chk>/impl.py │
│ - Writes meta.toml (spec hash, timestamp, model, versions) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Test Harness runs: │
│ 1. Doctests (pytest wrappers) │
│ 2. Type check (mypy) │
│ 3. Lint (ruff) │
│ Result: PASS or FAIL │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ If tests pass, developer runs: │
│ vibesafe save --target <unit> │
│ │
│ Writes to .vibesafe/index.toml: │
│ [<unit>] │
│ active = "<H_chk>" │
│ created = "2025-10-30T12:34:56Z" │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Runtime: Import from __generated__/ │
│ from __generated__.examples.math import sum_str │
│ │
│ Shim calls: load_active("examples.math/sum_str") │
│ 1. Read .vibesafe/index.toml for active hash │
│ 2. Load .vibesafe/checkpoints/<unit>/<hash>/impl.py │
│ 3. In prod mode: verify H_spec matches checkpoint meta │
│ 4. Return the function object │
└─────────────────────────────────────────────────────────────────┘
Vibesafe uses a pluggable provider system. Phase 1 ships with openai-compatible, which works with:
- OpenAI (GPT-4o, GPT-4o-mini)
- Anthropic (via OpenAI-compatible proxy)
- Local LLMs (llama.cpp, vLLM, Ollama with OpenAI API)
Provider interface:
class Provider(Protocol):
def complete(
self,
prompt: str,
system: str | None = None,
seed: int = 42,
temperature: float = 0.0,
max_tokens: int | None = None,
**kwargs
) -> str:
"""Return generated code as string."""Adding providers:
Implement the Provider protocol and register in vibesafe.toml:
[provider.anthropic]
kind = "anthropic-native"
model = "claude-3-5-sonnet-20250131"
api_key_env = "ANTHROPIC_API_KEY"Dev mode (env = "dev"):
- Import triggers
load_active(unit_id) - Read
.vibesafe/index.tomlfor active checkpoint hash - Compute current spec hash
H_spec - If
H_spec≠ checkpoint's spec hash:- Warn: "Spec drift detected, regenerating..."
- Auto-run
vibesafe compile --target <unit> - Load new checkpoint
- Return function object
Prod mode (env = "prod" or VIBESAFE_ENV=prod):
- Import triggers
load_active(unit_id) - Read
.vibesafe/index.tomlfor active checkpoint hash - If no checkpoint: raise
VibesafeCheckpointMissing - Load checkpoint metadata from
meta.toml - Compute current spec hash
H_spec - If
H_spec≠ checkpoint's spec hash: raiseVibesafeHashMismatch - Return function object
This enforces:
- ✅ What you tested is what runs (no silent regeneration)
- ✅ Drift is caught before deployment
- ✅ Reproducibility across environments
1. CI/CD gating:
# .github/workflows/ci.yml
jobs:
vibesafe-check:
runs-on: ubuntu-latest
steps:
- run: vibesafe diff
# Fails if any unit has drifted
- run: vibesafe test
# Runs all doctests + type/lint gates
- run: vibesafe save --dry-run
# Verifies all checkpoints existIn 6 months of use, this caught 23 unintended spec changes (typos in doctests, accidental signature edits) before merge.
2. Frozen HTTP dependencies:
# Before deploying FastAPI app
vibesafe save --target api.routes --freeze-http-deps
git add requirements.vibesafe.txt .vibesafe/checkpoints/
git commit -m "Lock FastAPI endpoint dependencies"The meta.toml records:
[deps]
fastapi = "0.115.2"
starlette = "0.41.2"
pydantic = "2.9.1"Now your containerized deployment uses the exact versions that passed tests, preventing "works on my laptop" bugs.
3. Prompt regression coverage:
Every time you change a spec, the hash changes. This creates a natural test suite for prompt engineering:
# After editing prompts/function.j2
vibesafe compile --force # Regenerate all units
vibesafe test # Verify all doctests still pass
vibesafe diff # Review generated code changesIf a prompt change breaks existing specs, doctests fail immediately. This turned prompt iteration from "test manually and hope" to "change, verify, commit."
4. Local agents + vibesafe.toml contract:
The vibesafe.toml file is the single source of truth for:
- Which model to use
- What temperature/seed settings
- Where checkpoints live
- Which prompt templates apply
Local AI coding agents (Claude Code, Cursor, GitHub Copilot) can read vibesafe.toml and understand the contract without asking the developer. Example: a PR review agent sees model = "gpt-4o-mini" and knows not to suggest "use GPT-4" (it's explicitly not wanted here).
The examples/ directory doubles as regression fixtures:
$ tree examples/
examples/
├── math/
│ └── ops.py # sum_str, fibonacci, is_prime
└── api/
└── routes.py # sum_endpoint, hello_endpoint
$ vibesafe test --target examples.math.ops
✓ sum_str [3 doctests]
✓ fibonacci [4 doctests]
✓ is_prime [5 doctests]
[PASS] 3/3 unitsThese examples serve three purposes:
- Documentation: Show real usage patterns
- Testing: Verify vibesafe's own codegen pipeline
- Fixtures: Golden tests for prompt/model changes
| Feature | Status | Notes |
|---|---|---|
| Python 3.12+ support | ✅ | Tested on 3.12, 3.13 |
@vibesafe.func decorator |
✅ | Pure function generation |
@vibesafe.http decorator |
✅ | FastAPI endpoint generation |
| Doctest verification | ✅ | Auto-extracted from docstrings |
| Type checking (mypy) | ✅ | Mandatory gate before save |
| Linting (ruff) | ✅ | Enforces style consistency |
| Hash-locked checkpoints | ✅ | SHA-256 content addressing |
| Drift detection | ✅ | vibesafe diff command |
| OpenAI-compatible providers | ✅ | Works with OpenAI, proxies, local LLMs |
CLI (scan, compile, test, save, status, diff, check) |
✅ | vibesafe or vibe alias |
| Dependency freezing | ✅ | --freeze-http-deps flag |
| Jinja2 prompt templates | ✅ | Customizable via vibesafe.toml |
| LLM response caching | ✅ | Keyed by spec hash, speeds up iteration |
| Subprocess sandbox | ✅ | Optional isolation for test runs |
Current coverage: 150+ checkpointed functions across 3 internal projects, 95% test coverage for vibesafe core.
Phase 2 (In Progress) — See ROADMAP.md
- Interactive REPL (
vibesafe repl --target <unit>)- Commands:
gen,tighten,diff,save,rollback - Planned Q2 2025
- Commands:
- Property-based testing (Hypothesis integration)
- Extract
hypothesis:fences from docstrings - Auto-generate property tests
- Extract
- Multi-provider support (Anthropic native, Gemini, local inference)
- Advanced dependency tracing (hybrid static + runtime)
- Web UI dashboard (checkpoint browser, diff viewer)
- Sandbox enhancements (network/FS isolation, resource limits)
- PyPI package release (
pip install vibesafe) - Documentation site (Docusaurus on GitHub Pages)
- VS Code extension (syntax highlighting for
@vibesafespecs) - Performance benchmarks (compilation time, test throughput)
- Migration guide (v0.1 → v0.2)
Contributions welcome! Please:
- Open an issue first for features/bugs
- Follow the spec in SPEC.md
- Add tests for new functionality
- Update TODOS.md if you complete a roadmap item
Development setup:
git clone https://github.com/julep-ai/vibesafe.git
cd vibesafe
uv venv && source .venv/bin/activate
uv pip install -e ".[dev]"
# Run tests
pytest -n auto
# Type check
mypy src/vibesafe
# Lint
ruff check src/ tests/ examples/
# Format
ruff format src/ tests/ examples/Claude-powered CI:
This repo uses Claude Code for automated PR reviews and test failure analysis. See .github/CLAUDE_ACTIONS.md for setup.
- ✅ Iteration speed: Dev mode auto-regenerates on import, no manual compile step
- ✅ Reproducibility: Same spec = same hash = same code
- ✅ Testability: Doctests are mandatory, enforced at save time
- ✅ Prod safety: Hash mismatches block execution, preventing drift
- ❌ Complex state machines: Specs are per-function, not multi-step workflows (use orchestration layer)
- ❌ Dynamic prompt injection: Templates are static Jinja2, not runtime-constructed (by design, for reproducibility)
- ❌ Multi-language support: Python-only (Rust/TypeScript on roadmap if demand exists)
- ❌ GUI for non-coders: CLI-first tool, requires Python knowledge
- Exploratory prototyping: If you're not sure what the API should be, write it manually first
- Performance-critical code: LLM-generated implementations may not be optimally optimized (profile before deploying)
- Regulatory/compliance code: Review generated code manually; vibesafe ensures reproducibility, not correctness
- Sub-second latency requirements: Checkpoint loading adds ~10ms overhead on first import
MIT — see LICENSE
Built with:
- uv — Fast Python package manager
- ruff — Fast Python linter
- mypy — Static type checker
- pytest — Testing framework
- Jinja2 — Prompt templating
Inspired by:
- Defunctionalization (Reynolds, 1972) — Making implicit control explicit
- Content-addressed storage (Git, Nix) — Deterministic builds via hashing
- Test-driven development — Specs as executable contracts
- Literate programming (Knuth) — Code that explains itself
- Issues: github.com/julep-ai/vibesafe/issues
- Discussions: github.com/julep-ai/vibesafe/discussions
- Email: [email protected]