zai-org/chatglm3

★ 13,699PythonAudience · developerComplexity · 4/5LicenseSetup · hard

Mindmap

mindmap
  root((ChatGLM3))
    What it does
      Bilingual chat AI
      Tool calling
      Code execution
    Model variants
      Standard 8K context
      Long context 32K
      Extra long 128K
    Tech Stack
      Python
      PyTorch
      HuggingFace
    Audience
      AI researchers
      App developers
      Chinese language users

mindmap root((ChatGLM3)) What it does Bilingual chat AI Tool calling Code execution Model variants Standard 8K context Long context 32K Extra long 128K Tech Stack Python PyTorch HuggingFace Audience AI researchers App developers Chinese language users

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Run a local Chinese-English chatbot on your own machine without sending data to a cloud service

USE CASE 2

Build an AI agent that can call external APIs and execute code steps automatically using the built-in tool-calling mode

USE CASE 3

Summarize very long Chinese or English documents using the 32K or 128K context window variants

Tech stack

PythonPyTorchHuggingFaceTransformers

Getting it running

Difficulty · hard Time to first run · 1h+

Requires downloading multi-GB model weights from HuggingFace or ModelScope, a GPU is strongly recommended for reasonable response speed.

Free to use for academic research, commercial use is allowed but requires filling out a registration form with ZhipuAI before deploying.

In plain English

ChatGLM3 is an open-source conversational AI model that speaks both Chinese and English. It was built jointly by ZhipuAI and the KEG Lab at Tsinghua University. The main model in the series, ChatGLM3-6B, has 6 billion parameters and is designed to run on consumer hardware rather than requiring large data center infrastructure. The model goes beyond simple back-and-forth chat. It natively supports tool calling (where the model can invoke external functions or APIs on your behalf), code execution through a built-in interpreter, and multi-step agent tasks where it reasons through a problem in stages. The README describes a revised prompt format that makes these capabilities work without extra configuration. Four variants are available for download. The standard ChatGLM3-6B handles context windows up to 8,000 tokens, which is enough for most conversations. ChatGLM3-6B-32K extends that to 32,000 tokens, which helps with longer documents, and ChatGLM3-6B-128K pushes further still for very long-form reading tasks. A separate base model (without the chat fine-tuning) is also released for researchers who want to build on top of it. Benchmark scores show this family substantially outperformed comparably sized models on math, reasoning, and coding tests at the time of release. Long-document tasks showed average gains of over 50 percent compared to the previous generation. The weights are free to use for academic research. Commercial use is allowed after filling out a registration form. The README notes that the newer GLM-4 series has since been released and improves further on these results, so users who need the best current performance are pointed toward that newer family. To get started you clone the repository, install the Python dependencies, and then download the model weights from HuggingFace or ModelScope. A combined demo lets you switch between chat mode, tool-use mode, and code-interpreter mode in one interface. Third-party projects for faster inference on laptops, TPUs, and NVIDIA GPUs are also listed in the README.

Copy-paste prompts

Prompt 1

Write a Python script that loads ChatGLM3-6B from HuggingFace and answers questions about a long uploaded document

Prompt 2

Show me how to set up ChatGLM3's tool-calling mode so the model can look up real-time weather by invoking a function I define

Prompt 3

Write a Python script using ChatGLM3-6B to build a simple bilingual customer support chatbot that replies in the same language the user wrote in

Prompt 4

How do I run ChatGLM3-6B-32K to summarize a long meeting transcript? Show me the full inference code from loading to output

Open on GitHub → Explain another repo

← zai-org on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.