openai/gpt-oss

Analysis updated 2026-06-21

★ 20,095PythonAudience · developerComplexity · 4/5LicenseSetup · hard

Mindmap

mindmap
  root((gpt-oss))
    Models
      120b large model
      20b smaller model
    Capabilities
      Reasoning levels
      Function calling
      Structured output
      Web browsing
    Run locally
      Ollama two commands
      LM Studio GUI
      vLLM production
      Hugging Face
    Tech
      MoE architecture
      MXFP4 quantization
      Apache 2.0 license

mindmap root((gpt-oss)) Models 120b large model 20b smaller model Capabilities Reasoning levels Function calling Structured output Web browsing Run locally Ollama two commands LM Studio GUI vLLM production Hugging Face Tech MoE architecture MXFP4 quantization Apache 2.0 license

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Run a powerful 120B AI model locally on a single H100 or MI300X GPU for inference without OpenAI API costs.

USE CASE 2

Build a production AI API endpoint using vLLM to serve gpt-oss-20b on a 16GB GPU server.

USE CASE 3

Use function calling and structured outputs from a locally hosted model to build AI-powered tools without sending data externally.

USE CASE 4

Study the PyTorch reference implementation to understand Mixture-of-Experts architecture and MXFP4 quantization.

What is it built with?

PythonPyTorchTritonMetalvLLMOllama

How does it compare?

	openai/gpt-oss	hkuds/rag-anything	facebook/prophet
Stars	20,095	20,146	20,179
Language	Python	Python	Python
Setup difficulty	hard	moderate	moderate
Complexity	4/5	4/5	3/5
Audience	developer	developer	data

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 30min

Requires an NVIDIA H100 or AMD MI300X GPU with 80GB VRAM for the 120b model, a 16GB GPU is sufficient for the 20b model.

Apache 2.0, use freely for any purpose including commercial, modify and redistribute with attribution, no copyleft restrictions.

In plain English

gpt-oss is a pair of open-weight AI language models released by OpenAI: gpt-oss-120b (a large model with 117 billion total parameters but only 5.1 billion active at once) and gpt-oss-20b (a smaller, faster model with 21 billion parameters). "Open-weight" means the model weights, the learned numerical values that define how the model thinks, are publicly downloadable and can be run on your own hardware, unlike OpenAI's proprietary models which require API access. Both models are Mixture-of-Experts (MoE) models, a design where only a fraction of the network activates for any given input. This makes the 120b model surprisingly efficient: despite its large size, it fits on a single NVIDIA H100 or AMD MI300X GPU (80GB of memory) because of MXFP4 quantization, a technique that compresses the model's numbers to use less memory. The 20b model runs within 16GB of memory, making it accessible on high-end consumer hardware. The models support reasoning with configurable effort levels (low, medium, or high), full access to the model's internal chain-of-thought, function calling, web browsing, Python code execution, and structured outputs. They use a specific "Harmony" message format that must be applied correctly for the models to work. You can run these models locally using Ollama (two commands to download and start), LM Studio, the Hugging Face Transformers library, or vLLM for production serving. The models are licensed under Apache 2.0, making them free to use commercially without copyleft restrictions. The repository also includes educational reference implementations in PyTorch, Triton, and Metal.

Copy-paste prompts

Prompt 1

I have an NVIDIA H100 80GB GPU. Show me the exact Ollama commands to download and run gpt-oss-120b with high reasoning effort and the correct Harmony message format.

Prompt 2

Set up vLLM to serve gpt-oss-20b as a local API endpoint. Include the config file and startup command.

Prompt 3

Write a Python script using Hugging Face Transformers to run gpt-oss-20b with function calling for a weather lookup tool.

Prompt 4

Explain the Mixture-of-Experts design in gpt-oss-120b: why does a 117B parameter model only activate 5.1B parameters at once, and how does MXFP4 reduce GPU memory usage?

Prompt 5

Help me run gpt-oss-20b in LM Studio on a machine with 16GB VRAM and set the reasoning effort to medium.

Frequently asked questions

What is gpt-oss?

Two open-weight AI models from OpenAI, a 120B and a 20B parameter model, downloadable and runnable on your own GPU without API access, licensed Apache 2.0.

What language is gpt-oss written in?

Mainly Python. The stack also includes Python, PyTorch, Triton.

What license does gpt-oss use?

Apache 2.0, use freely for any purpose including commercial, modify and redistribute with attribution, no copyleft restrictions.

How hard is gpt-oss to set up?

Setup difficulty is rated hard, with roughly 30min to a first successful run.

Who is gpt-oss for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub openai on gitmyhub

Verify against the repo before relying on details.