qwenlm/qwen

★ 21,109PythonAudience · developerComplexity · 4/5LicenseSetup · moderate

Mindmap

mindmap
  root((Qwen))
    What it does
      Chat conversations
      Code generation
      Math solving
      Tool use and agents
    Model sizes
      1.8B parameters
      7B parameters
      14B parameters
      72B parameters
    How to use
      Inference quickstart
      Quantization
      Finetuning with LoRA
      Deploy with vLLM
    Training data
      3 trillion tokens
      Multilingual focus
      Chinese and English
    Integration options
      DashScope API
      Web demos
      CLI tools

mindmap root((Qwen)) What it does Chat conversations Code generation Math solving Tool use and agents Model sizes 1.8B parameters 7B parameters 14B parameters 72B parameters How to use Inference quickstart Quantization Finetuning with LoRA Deploy with vLLM Training data 3 trillion tokens Multilingual focus Chinese and English Integration options DashScope API Web demos CLI tools

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Build a chatbot or conversational AI assistant that understands Chinese and English.

USE CASE 2

Fine-tune a smaller Qwen model (1.8B or 7B) on your own data to solve domain-specific tasks.

USE CASE 3

Deploy a code-generation tool that writes and debugs code in multiple languages.

USE CASE 4

Create an AI agent that uses external tools and APIs to answer complex questions.

Tech stack

PythonPyTorchvLLMFastChatLoRA

Getting it running

Difficulty · moderate Time to first run · 30min

Requires downloading large model weights (1.8B, 72B GB) and PyTorch/vLLM setup, inference works locally but training/fine-tuning needs GPU.

Use freely for any purpose including commercial. Keep the notice and disclose changes to the patent grant.

In plain English

Qwen is the original open-source large language model series from Alibaba Cloud (Tongyi Qianwen in Chinese). The repository hosts the first-generation Qwen models and their chat-tuned counterparts. The README opens with an important note that Qwen2 is now available in a separate repository (QwenLM/Qwen2), and that this repo is no longer actively maintained because the codebase has diverged. So the project is mainly a reference point for the original Qwen 1 generation rather than something you would start with today for production work. The series comes in four sizes: Qwen-1.8B, Qwen-7B, Qwen-14B, and Qwen-72B. Each size is released as a base language model (the raw pretrained version) and as a chat model (Qwen-Chat), which has been aligned with human preferences through supervised fine-tuning and RLHF. The chat models can hold conversations, write and summarize text, extract information, translate, write code, solve math problems, call tools, act as agents, and even act as a code interpreter. Each chat model is also released in Int4 and Int8 quantized versions, which use less GPU memory at the cost of some precision. Downloads are hosted on Hugging Face and ModelScope. The base models were pretrained on up to 3 trillion tokens of multilingual data, with a particular focus on Chinese and English alongside many other languages and domains. The README reports the release dates, max context length (8K for Qwen-14B, 32K for the others), pretraining token counts, minimum GPU memory required for Q-LoRA finetuning (from about 5.8GB for the 1.8B model up to 61.4GB for the 72B), and minimum GPU memory for generating 2048 tokens with the Int4 quantized version (from about 2.9GB up to 48.9GB). All four sizes support tool usage. The repository documents how to get started with inference, how to use the quantized models including GPTQ and KV-cache quantization, performance statistics, finetuning tutorials (full-parameter, LoRA, and Q-LoRA), deployment instructions using vLLM and FastChat, how to build a WebUI or CLI demo, how to call the DashScope API service, how to build an OpenAI-style API in front of your local model, how to use Qwen for tool use and agents, long-context evaluation, FAQ, and the license. A technical report describing the series is published at arxiv.org/abs/2309.16609.

Copy-paste prompts

Prompt 1

How do I set up Qwen 7B for inference on my GPU? Walk me through the quickstart.

Prompt 2

I want to fine-tune Qwen 14B using LoRA on my custom dataset. What are the steps?

Prompt 3

Show me how to quantize a Qwen model so it runs on a smaller GPU with less memory.

Prompt 4

How do I deploy Qwen as a web demo using vLLM and FastChat?

Prompt 5

Can you explain how to enable tool use in Qwen chat models so they can call external APIs?

Open on GitHub → Explain another repo

← qwenlm on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.