Skip to content

Conversation

@iplanwebsites
Copy link

No description provided.

Add support for running AI inference entirely in the browser using WebLLM,
without requiring an API key. Users can now select "WebLLM (Local)" from the
model dropdown to run inference locally using WebGPU.

Key changes:
- Add @built-in-ai/web-llm package for Vercel AI SDK integration
- Add webllm model option to models.ts with isLocal flag
- Create useWebLLMChat hook for client-side chat handling
- Create WebLLMChat component for local inference UI
- Create ChatWrapper to switch between API and local modes
- Add /api/chat/webllm-save endpoint for persisting messages
- Add WebLLMStatus component showing download/loading progress
- Update entitlements to include webllm model for all users
Auto-formatting applied by biome linter for consistent code style.
Replace single WebLLM model with quality-based options:
- webllm-draft: Fastest, uses Qwen3-0.6B
- webllm-standard: Balanced, uses Llama-3.2-3B
- webllm-high: Better quality, uses Qwen3-4B
- webllm-best: Best quality, uses Llama-3.1-8B

Users can now easily toggle between server-side (API gateway) and
client-side (WebLLM) inference by selecting different models from
the dropdown. WebLLM models run entirely in the browser without
requiring an API key.
@vercel
Copy link

vercel bot commented Nov 26, 2025

@claude is attempting to deploy a commit to the Vercel Team on Vercel.

A member of the Team first needs to authorize it.

@socket-security
Copy link

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Added@​built-in-ai/​web-llm@​0.3.17710010088100

View full report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants