Run a private AI assistant on your own GPU without relying on OpenAI's API or paying per-token fees.
Build applications that need reasoning, web search, and code execution without external API calls.
Fine-tune or customize the model weights for domain-specific tasks using your own training data.
Deploy a production inference server using vLLM to serve multiple users with low latency.
120B model requires significant GPU VRAM (80GB+) or multi-GPU setup; downloading and quantizing models takes hours.
gpt-oss is a pair of open-weight AI language models released by OpenAI: gpt-oss-120b (a large model with 117 billion total parameters but only 5.1 billion active at once) and gpt-oss-20b (a smaller, faster model with 21 billion parameters). "Open-weight" means the model weights, the learned numerical values that define how the model thinks, are publicly downloadable and can be run on your own hardware, unlike OpenAI's proprietary models which require API access. Both models are Mixture-of-Experts (MoE) models, a design where only a fraction of the network activates for any given input. This makes the 120b model surprisingly efficient: despite its large size, it fits on a single NVIDIA H100 or AMD MI300X GPU (80GB of memory) because of MXFP4 quantization, a technique that compresses the model's numbers to use less memory. The 20b model runs within 16GB of memory, making it accessible on high-end consumer hardware. The models support reasoning with configurable effort levels (low, medium, or high), full access to the model's internal chain-of-thought, function calling, web browsing, Python code execution, and structured outputs. They use a specific "Harmony" message format that must be applied correctly for the models to work. You can run these models locally using Ollama (two commands to download and start), LM Studio, the Hugging Face Transformers library, or vLLM for production serving. The models are licensed under Apache 2.0, making them free to use commercially without copyleft restrictions. The repository also includes educational reference implementations in PyTorch, Triton, and Metal.
Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.