mlc-ai/web-llm

★ 17,967TypeScriptAudience · developerComplexity · 3/5Setup · moderate

Mindmap

mindmap
  root((web-llm))
    What it does
      LLMs in browser
      No server needed
      Private on-device AI
    Tech
      TypeScript
      WebGPU
      Web Workers
    Models supported
      Llama 3
      Phi 3
      Gemma
      Mistral
    Use cases
      Offline AI apps
      Privacy-first tools
      Chrome extensions

mindmap root((web-llm)) What it does LLMs in browser No server needed Private on-device AI Tech TypeScript WebGPU Web Workers Models supported Llama 3 Phi 3 Gemma Mistral Use cases Offline AI apps Privacy-first tools Chrome extensions

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Add AI chat to a web app without paying for a cloud API or sending any user data to a server.

USE CASE 2

Replace OpenAI API calls in an existing web app with on-device AI by swapping in WebLLM with the same code.

USE CASE 3

Build a Chrome extension with built-in AI that works offline using local model inference in the browser.

USE CASE 4

Prototype an AI-powered web tool quickly by loading WebLLM from a CDN in JSFiddle or CodePen.

Tech stack

TypeScriptJavaScriptWebGPUWeb WorkersNPM

Getting it running

Difficulty · moderate Time to first run · 30min

Requires a browser with WebGPU support such as Chrome 113 or later, does not work in Safari or Firefox without experimental flags.

In plain English

WebLLM is a tool that lets you run large language models, the kind of AI that powers chatbots like ChatGPT, directly inside a web browser, with no server doing the work behind the scenes. Everything happens on the user's own machine, accelerated by WebGPU, a modern browser standard that lets web pages tap into the computer's graphics card for fast computation. The project is designed as a drop-in replacement for the OpenAI API. If you have an app that already talks to ChatGPT, you can point it at WebLLM and keep the same code, including features like streaming responses and structured JSON output. Function calling is listed as a work in progress. WebLLM ships with support for several open-source model families including Llama 3, Phi 3, Gemma, Mistral, and Qwen, and you can compile and load your own custom models in the MLC format. You would reach for WebLLM if you want to ship AI features in a web app without paying for a cloud API or sending user data off the user's device, or if offline use matters for your audience. It can offload work to Web Workers or Service Workers so the user interface stays responsive, and it can be packaged into Chrome extensions. The package is written in TypeScript and is published on NPM, it can also be loaded straight from a CDN for quick prototyping in tools like JSFiddle or CodePen. It is a companion to the broader MLC LLM project, which targets the same models across other hardware environments. The full README is longer than what was provided.

Copy-paste prompts

Prompt 1

Show me how to add WebLLM to a web page so a user can type a question and get a response from Llama 3 running entirely in their browser.

Prompt 2

How do I replace my existing OpenAI API calls with WebLLM so the same code runs on the user's device instead of a cloud server?

Prompt 3

Set up WebLLM to run inside a Web Worker so the page stays responsive while the model processes a long prompt.

Prompt 4

Show me how to stream a response from WebLLM token by token into a text area on my web page as the model generates it.

Prompt 5

How do I package WebLLM into a Chrome extension so users can chat with a local AI model without an internet connection?

Open on GitHub → Explain another repo

← mlc-ai on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.