qwenlm/qwen

Analysis updated 2026-05-18

★ 21,109PythonAudience · developerComplexity · 4/5LicenseSetup · moderate

Mindmap

mindmap
  root((Qwen))
    What it does
      Chat conversations
      Code generation
      Math solving
      Tool use and agents
    Model sizes
      1.8B parameters
      7B parameters
      14B parameters
      72B parameters
    How to use
      Inference quickstart
      Quantization
      Finetuning with LoRA
      Deploy with vLLM
    Training data
      3 trillion tokens
      Multilingual focus
      Chinese and English
    Integration options
      DashScope API
      Web demos
      CLI tools

mindmap root((Qwen)) What it does Chat conversations Code generation Math solving Tool use and agents Model sizes 1.8B parameters 7B parameters 14B parameters 72B parameters How to use Inference quickstart Quantization Finetuning with LoRA Deploy with vLLM Training data 3 trillion tokens Multilingual focus Chinese and English Integration options DashScope API Web demos CLI tools

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Build a chatbot or conversational AI assistant that understands Chinese and English.

USE CASE 2

Fine-tune a smaller Qwen model (1.8B or 7B) on your own data to solve domain-specific tasks.

USE CASE 3

Deploy a code-generation tool that writes and debugs code in multiple languages.

USE CASE 4

Create an AI agent that uses external tools and APIs to answer complex questions.

What is it built with?

PythonPyTorchvLLMFastChatLoRA

How does it compare?

	qwenlm/qwen	verl-project/verl	huggingface/peft
Stars	21,109	21,107	21,070
Language	Python	Python	Python
Setup difficulty	moderate	hard	moderate
Complexity	4/5	4/5	3/5
Audience	developer	researcher	researcher

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

Requires downloading large model weights (1.8B, 72B GB) and PyTorch/vLLM setup, inference works locally but training/fine-tuning needs GPU.

Use freely for any purpose including commercial. Keep the notice and disclose changes to the patent grant.

In plain English

Qwen is the original open-source large language model series from Alibaba Cloud (Tongyi Qianwen in Chinese). The repository hosts the first-generation Qwen models and their chat-tuned counterparts. The README opens with an important note that Qwen2 is now available in a separate repository (QwenLM/Qwen2), and that this repo is no longer actively maintained because the codebase has diverged. So the project is mainly a reference point for the original Qwen 1 generation rather than something you would start with today for production work. The series comes in four sizes: Qwen-1.8B, Qwen-7B, Qwen-14B, and Qwen-72B. Each size is released as a base language model (the raw pretrained version) and as a chat model (Qwen-Chat), which has been aligned with human preferences through supervised fine-tuning and RLHF. The chat models can hold conversations, write and summarize text, extract information, translate, write code, solve math problems, call tools, act as agents, and even act as a code interpreter. Each chat model is also released in Int4 and Int8 quantized versions, which use less GPU memory at the cost of some precision. Downloads are hosted on Hugging Face and ModelScope. The base models were pretrained on up to 3 trillion tokens of multilingual data, with a particular focus on Chinese and English alongside many other languages and domains. The README reports the release dates, max context length (8K for Qwen-14B, 32K for the others), pretraining token counts, minimum GPU memory required for Q-LoRA finetuning (from about 5.8GB for the 1.8B model up to 61.4GB for the 72B), and minimum GPU memory for generating 2048 tokens with the Int4 quantized version (from about 2.9GB up to 48.9GB). All four sizes support tool usage. The repository documents how to get started with inference, how to use the quantized models including GPTQ and KV-cache quantization, performance statistics, finetuning tutorials (full-parameter, LoRA, and Q-LoRA), deployment instructions using vLLM and FastChat, how to build a WebUI or CLI demo, how to call the DashScope API service, how to build an OpenAI-style API in front of your local model, how to use Qwen for tool use and agents, long-context evaluation, FAQ, and the license. A technical report describing the series is published at arxiv.org/abs/2309.16609.

Copy-paste prompts

Prompt 1

How do I set up Qwen 7B for inference on my GPU? Walk me through the quickstart.

Prompt 2

I want to fine-tune Qwen 14B using LoRA on my custom dataset. What are the steps?

Prompt 3

Show me how to quantize a Qwen model so it runs on a smaller GPU with less memory.

Prompt 4

How do I deploy Qwen as a web demo using vLLM and FastChat?

Prompt 5

Can you explain how to enable tool use in Qwen chat models so they can call external APIs?

Frequently asked questions

What is qwen?

Alibaba's open-source family of large language models (1.8B, 72B parameters) trained on 3 trillion multilingual tokens, with chat versions for conversation, coding, math, and tool use.

What language is qwen written in?

Mainly Python. The stack also includes Python, PyTorch, vLLM.

What license does qwen use?

Use freely for any purpose including commercial. Keep the notice and disclose changes to the patent grant.

How hard is qwen to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is qwen for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub qwenlm on gitmyhub

Verify against the repo before relying on details.