Skip to content

mcowger/costrouter

Repository files navigation

LLM Gateway

An intelligent reverse proxy for routing requests to multiple Large Language Model (LLM) providers with real-time monitoring and cost optimization.

Features

  • Intelligent Routing: Automatically selects providers using a cost-optimized algorithm that prioritizes zero-cost providers and selects the lowest-cost paid options when needed.
  • Real-time Monitoring: A Vue.js dashboard provides a live view of usage statistics, including requests, tokens, and cost per provider.
  • Dynamic Configuration: Update provider settings, models, and limits from the UI without restarting the server.
  • Extensive Provider Support: Natively supports a wide range of LLM providers through the Vercel AI SDK, including OpenAI, Anthropic, Google, Groq, Mistral, claude-code, gemini-cli, and any OpenAI-compatible API.
  • Unified API: A single, consistent OpenAI-compatible endpoint for all backend providers.

Getting Started (Development)

Prerequisites

  • Node.js 22+
  • npm (comes with Node.js)

Setup

  1. Clone the repository:

    git clone https://github.com/mcowger/costrouter.git
    cd costrouter
  2. Install dependencies:

    npm install
  3. Create and configure your settings:

    • Create a config directory: mkdir config
    • Optional: Copy the example configuration: cp config.test.jsonc config/config.jsonc. If you dont create a config, you'll just start up with an empty config that you can edit via the UI.
    • Optional Edit config/config.jsonc and add your provider API keys and settings.
  4. Run the application:

    npm run dev

    This command starts both the backend server (on port 3000) and the frontend UI (on port 5173) concurrently.

  5. Access the Dashboard: Open http://localhost:5173 in your browser.

Running with Docker

  1. Build the Docker image:

    npm run docker-build
  2. Create a config directory (optional):

    mkdir -p ./config
    cp config.test.jsonc ./config/config.jsonc  # Optional: copy example config
  3. Run the Docker container:

    docker run --rm -p 3000:3000 -v $(pwd)/config:/config --name llm-gateway-container llm-gateway
    • The gateway will be accessible on http://localhost:3000.
    • The UI is served from the same port at the / route.
    • Configuration is mounted from your host machine. If you dont have an existing config file, an empty one will be made for you.

Basic Usage

Send requests to the gateway's OpenAI-compatible endpoint:

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer any-string-is-valid" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Provider Selection Algorithm

The gateway uses an intelligent, cost-optimized routing algorithm that works as follows:

  1. Model Matching: Identifies all providers that support the requested model (by name or mapped name)
  2. Rate Limit Filtering: Removes providers that have exceeded their configured rate limits
  3. Cost-Based Partitioning: Separates remaining providers into two groups:
    • Zero-cost providers: Those with all pricing fields explicitly set to 0
    • Paid providers: All others, including those with undefined/unknown pricing
  4. Selection Strategy:
    • If zero-cost providers are available, randomly selects one to distribute load
    • Otherwise, selects the lowest-cost paid provider based on:
      • Primary sort: Input cost per million tokens
      • Secondary sort: Output cost per million tokens
      • Providers with undefined costs are deprioritized (sorted last)

This algorithm ensures cost optimization while maintaining high availability through intelligent failover.

Architecture

The gateway uses a pipeline pattern with singleton managers for core services:

  • ConfigManager: Loads and validates configuration.
  • UsageManager: Tracks and enforces rate limits in real-time.
  • Router: Selects the optimal provider for each incoming request using the cost-optimized algorithm.
  • UnifiedExecutor: Executes the request against the chosen provider using the Vercel AI SDK.

Technology Stack

  • Backend: Node.js, Express, TypeScript, Vercel AI SDK, Zod, Pino, rate-limiter-flexible
  • Frontend: Vue 3, Vite, Pinia, Vue Router
  • Build Tools: npm (use npx for binaries like tsc, tsx, vue-tsc, prettier), Docker, and the new docker-build npm script.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors 2

  •  
  •