Interactive Phoenix LiveView demonstrations of the Crucible Framework
This project showcases the Crucible Framework's components through mock LLM scenarios with real framework orchestration. Experience ensemble voting, request hedging, statistical analysis, causal tracing, and production monitoring through beautiful, real-time visualizations.
Crucible Examples is a live, interactive demo application that helps you:
- Understand how Crucible's reliability tools work in practice
- Visualize statistical rigor applied to LLM evaluation
- Experience real-time ensemble voting, hedging, and optimization
- Learn best practices for production ML systems
- Mock LLM calls simulate realistic model responses, latency, and costs
- Real Crucible Framework orchestration, statistics, and analysis
- No API keys required - everything runs locally with simulated data
- Real-time visualization of framework components in action
Scenario: Medical diagnosis assistant with high-stakes decisions
- Watch 5 models respond to the same query
- See voting strategies (majority, weighted, unanimous) in action
- Real-time consensus metrics and confidence intervals
- Cost/accuracy trade-off visualization
Scenario: Customer support chatbot with latency requirements
- Visualize tail latency reduction (P99 improvements)
- Compare hedging strategies (fixed, adaptive, percentile-based)
- Live latency histograms and firing indicators
- Cost efficiency metrics (latency improvement per $)
Scenario: A/B test comparing two models on math problems
- Configure experiment parameters
- Watch statistical analysis in real-time
- See t-tests, effect sizes, and confidence intervals
- Generate publication-ready reports
Scenario: Multi-step reasoning with decision transparency
- Interactive timeline of LLM reasoning process
- Explore alternatives considered at each step
- Uncertainty tracking over time
- Searchable event history
Scenario: Continuous model health monitoring
- 30-day historical baseline visualization
- Automated degradation detection with statistical tests
- Alert system based on confidence intervals
- Retraining trigger recommendations
Scenario: Systematic prompt parameter search
- Define variable search spaces (temperature, tokens, examples)
- Choose optimization strategy (grid, random, Bayesian)
- Watch convergence in real-time
- Validate optimal configuration
- Elixir 1.15+ and Erlang/OTP 26+
- Node.js 18+ (for Phoenix assets)
- Phoenix Framework 1.8+
# Clone the repo (if not already)
cd North-Shore-AI
# Navigate to crucible_examples
cd crucible_examples
# Install dependencies
mix deps.get
# Install Node.js dependencies
cd assets && npm install && cd ..
# Start the Phoenix server
mix phx.serverNow visit localhost:4000 in your browser.
crucible_examples/
βββ lib/
β βββ crucible_examples/ # Core application logic
β β βββ mock/ # Mock LLM system
β β β βββ models.ex # Simulated model responses
β β β βββ latency.ex # Realistic latency distributions
β β β βββ datasets.ex # Mock benchmark questions
β β β βββ pricing.ex # Cost tracking
β β βββ scenarios/ # Demo scenario implementations
β β β βββ ensemble_demo.ex # Ensemble voting demo
β β β βββ hedging_demo.ex # Request hedging demo
β β β βββ stats_demo.ex # Statistical comparison demo
β β β βββ trace_demo.ex # Causal trace demo
β β β βββ monitoring_demo.ex # Production monitoring demo
β β β βββ optimization_demo.ex # Optimization demo
β β βββ telemetry/ # Event collection
β βββ crucible_examples_web/ # Phoenix web interface
β βββ live/ # LiveView modules
β β βββ home_live.ex # Homepage
β β βββ ensemble_live.ex # Ensemble demo LiveView
β β βββ hedging_live.ex # Hedging demo LiveView
β β βββ stats_live.ex # Stats demo LiveView
β β βββ trace_live.ex # Trace explorer LiveView
β β βββ monitoring_live.ex # Monitoring dashboard LiveView
β β βββ optimization_live.ex # Optimization playground LiveView
β βββ components/ # Reusable components
β β βββ charts.ex # Chart components
β β βββ metrics_card.ex # Metrics display
β β βββ model_card.ex # Model response cards
β β βββ timeline.ex # Timeline visualization
β βββ router.ex # Route definitions
βββ assets/ # Frontend assets
βββ css/
β βββ app.css # Tailwind CSS
βββ js/
βββ app.js
βββ hooks/ # LiveView hooks
βββ chart_hook.js # Chart rendering
βββ timeline_hook.js # Timeline interactions
All examples use a sophisticated mock system that simulates realistic LLM behavior:
- GPT-4: High accuracy (94%), expensive ($0.005/query), medium latency
- Claude-3: High accuracy (93%), medium cost ($0.003/query), fast
- Gemini-Pro: Good accuracy (90%), cheap ($0.001/query), medium latency
- Llama-3: Decent accuracy (87%), very cheap ($0.0002/query), fast
- Mixtral: Good accuracy (89%), cheap ($0.0008/query), medium latency
- Latency distributions: P50/P95/P99 modeled after real-world data
- Tail latency: Occasional slow requests (10% of queries)
- Response variation: Models disagree ~15% of the time
- Cost tracking: Accurate per-model pricing
- Error injection: Rare failures (<1%) for robustness testing
- Phoenix Framework: Web application framework
- Phoenix LiveView: Real-time, server-rendered UI
- Tailwind CSS: Utility-first styling
- Contex: Elixir charting library
- Crucible Framework: Statistical testing and LLM reliability tools
- Crucible Bench - Statistical testing framework
- Crucible Ensemble - Multi-model voting
- Crucible Hedging - Tail latency reduction
- Crucible Trace - Causal reasoning traces
- Crucible Telemetry - Research-grade instrumentation
- Crucible Harness - Experiment orchestration
Each demo includes:
- Scenario description: Real-world context
- What it demonstrates: Key framework features
- How it works: Technical explanation
- Try it yourself: Interactive controls
# Run all tests
mix test
# Run specific test file
mix test test/crucible_examples/mock/models_test.exs
# Run with coverage
mix test --coverTest Suite: 85 comprehensive tests covering:
- Mock system (Models, Latency, Datasets, Pricing) - 80 tests
- Web controllers and views - 5 tests
- All tests passing with zero warnings
The examples/ directory contains runnable scripts demonstrating each feature:
# Mock system demonstrations
mix run examples/mock_models_demo.exs
mix run examples/latency_demo.exs
mix run examples/datasets_demo.exs
# Interactive demo scripts
mix run examples/ensemble_demo.exs
mix run examples/hedging_demo.exs
mix run examples/stats_demo.exsEach example script:
- Runs independently without the web server
- Demonstrates core framework capabilities
- Produces detailed console output
- Can be used for testing and benchmarking
mix formatmix assets.deploy
MIX_ENV=prod mix releaseThis project can be deployed to:
- Fly.io:
fly launch(recommended) - Heroku: Standard Phoenix deployment
- Gigalixir: Elixir-optimized platform
- Docker: Included Dockerfile
This is a demonstration project for the Crucible Framework. Contributions welcome:
- Fork the repository
- Create a feature branch
- Add new demo scenarios or improve existing ones
- Submit a pull request
MIT License - see LICENSE for details
Built with:
- Crucible Framework - Scientific LLM evaluation
- Phoenix Framework - Productive web framework
- Phoenix LiveView - Real-time experiences
Ready to explore? Start the server with mix phx.server and visit localhost:4000