explaingit

0hq/webgpt

3,785JavaScriptAudience · developerComplexity · 3/5Setup · moderate

TLDR

A proof-of-concept that runs a GPT-2 language model entirely inside your web browser using the graphics card, with no server or API key required, just HTML and JavaScript in under 1500 lines of code.

Mindmap

mindmap
  root((WebGPT))
    What it does
      GPT in browser
      No server needed
      WebGPU powered
    Models
      Shakespeare small
      GPT-2 117M
    Code
      Under 1500 lines
      Plain JavaScript
      HTML only
    Requirements
      WebGPU browser
      Git LFS weights
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Run a GPT-2 language model locally in the browser without any server, API key, or cloud dependency.

USE CASE 2

Learn how transformer models work by reading a minimal, self-contained WebGPU implementation under 1500 lines.

USE CASE 3

Generate text in the browser from a Shakespeare or GPT-2 model without sending any data to external services.

Tech stack

JavaScriptHTMLWebGPU

Getting it running

Difficulty · moderate Time to first run · 30min

Requires a WebGPU-enabled browser (Chrome Canary or Edge Canary) and Git LFS installed to download model weight files.

In plain English

WebGPT is a proof-of-concept project that runs a GPT-style language model directly inside a web browser, without any server sending the computations off to a cloud. It uses WebGPU, a newer browser technology that gives web pages access to the computer's graphics card for heavy number-crunching tasks. The entire implementation is written in plain JavaScript and HTML, coming in at under 1500 lines of code. The project includes two pre-packaged models you can try: a small model trained on Shakespeare's texts (described by the author as undertrained) and a larger GPT-2 model with 117 million parameters. Benchmarks on an M1 Mac show the 117M model generating text at around 30 milliseconds per token, meaning you get a new word roughly every 30 milliseconds. Running it locally is straightforward because it is just HTML and JavaScript files. The main requirement is a browser that supports WebGPU, which was still rolling out to major browsers when this was written. Chrome Canary or Edge Canary are the recommended options. The model weight files are large and stored using Git LFS, a system for tracking big files in a code repository, so you need that installed to download them after cloning. The project was built as a learning exercise. The author started with no background in how transformer models, GPU programming, or text tokenization work, and credits Andrej Karpathy's public video series for the foundational understanding. Code from the nanoGPT project and a JavaScript GPT tokenizer were also used as references. The roadmap in the README lists several remaining optimizations and open questions the author plans to address.

Copy-paste prompts

Prompt 1
Walk me through how webgpt uses WebGPU shaders to run a GPT-2 forward pass in the browser, which files handle what?
Prompt 2
How do I run the webgpt GPT-2 117M model locally in my browser, and which browsers currently support WebGPU?
Prompt 3
Explain the tokenization approach used in webgpt and how it converts input text to token IDs before the model sees it.
Prompt 4
I want to load my own model weights into webgpt, what format do the weights need to be in and how are they stored in the repo?
Prompt 5
How would I extend webgpt to run a larger model like GPT-2 345M, what GPU memory limits should I watch for?
Open on GitHub → Explain another repo

← 0hq on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.