qwenlm/qwen3

★ 27,204PythonAudience · developerComplexity · 3/5Setup · hard

Mindmap

mindmap
  root((Qwen3))
    What it does
      Text generation
      Reasoning tasks
      Multi-language support
    Modes
      Thinking mode
      Non-thinking mode
    Model sizes
      Small 0.6B
      Large 235B
      Mixture-of-Experts
    How to use
      Run locally
      Fine-tune custom
      Deploy at scale
    Tech approach
      Open-weight models
      100+ languages
      Parameter efficiency

mindmap root((Qwen3)) What it does Text generation Reasoning tasks Multi-language support Modes Thinking mode Non-thinking mode Model sizes Small 0.6B Large 235B Mixture-of-Experts How to use Run locally Fine-tune custom Deploy at scale Tech approach Open-weight models 100+ languages Parameter efficiency

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Build an AI chatbot that switches to deeper reasoning mode when users ask math or coding questions.

USE CASE 2

Run a smaller Qwen3 model on your own hardware to avoid API costs while building a customer-facing AI feature.

USE CASE 3

Fine-tune Qwen3 on your company's internal documents to create a specialized assistant for your domain.

USE CASE 4

Deploy a large Qwen3 variant as a backend service for a multi-language customer support application.

Tech stack

PythonPyTorchTransformersCUDA

Getting it running

Difficulty · hard Time to first run · 1day+

Requires downloading large model weights (up to 235B parameters), CUDA/GPU setup, and significant disk/memory resources.

License could not be detected automatically. Check the repository's LICENSE file before use.

In plain English

Qwen3 is a family of large language models, the kind of AI system that generates text in response to prompts, developed by the Qwen team at Alibaba Cloud. A large language model is the same general type of system that powers chat assistants and code helpers: you give it a question or instruction and it produces a written answer. This repository hosts the documentation and pointers to the actual model weight files, which are published on Hugging Face and ModelScope. The README describes two main flavors. An instruct version is tuned for direct chat and following instructions. A thinking version is tuned for reasoning-heavy tasks such as math, logic, science, and code, and works through problems in more deliberate steps before answering. Both come in several sizes, from small models in the single-digit billions of parameters to large ones in the hundreds of billions, with some built as Mixture-of-Experts designs that activate only part of the network per request. Recent updates extend the context window to 256K tokens and, for some variants, up to 1 million tokens. Someone would use Qwen3 to build a chatbot, a coding assistant, a translator, or an agent that calls external tools, any application that needs to generate or reason over text, especially when they want an open-weight model they can run themselves rather than calling a closed API. The README highlights support for over 100 languages and dialects. The repository is primarily documentation in a Python project layout, pointing to inference with Hugging Face Transformers and to local or server deployment via llama.cpp, Ollama, LM Studio, SGLang, vLLM, and TGI.

Copy-paste prompts

Prompt 1

How do I download and run Qwen3 locally on my machine using Python?

Prompt 2

Show me how to switch Qwen3 between thinking mode and non-thinking mode in my application.

Prompt 3

What's the smallest Qwen3 model I can run, and how much GPU memory does it need?

Prompt 4

How do I fine-tune Qwen3 on my own dataset to make it better at my specific use case?

Prompt 5

Compare the speed and quality tradeoffs between different Qwen3 model sizes for my chatbot.

Open on GitHub → Explain another repo

← qwenlm on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.