Run a local Chinese-English chatbot on your own machine without sending data to a cloud service
Build an AI agent that can call external APIs and execute code steps automatically using the built-in tool-calling mode
Summarize very long Chinese or English documents using the 32K or 128K context window variants
Requires downloading multi-GB model weights from HuggingFace or ModelScope, a GPU is strongly recommended for reasonable response speed.
ChatGLM3 is an open-source conversational AI model that speaks both Chinese and English. It was built jointly by ZhipuAI and the KEG Lab at Tsinghua University. The main model in the series, ChatGLM3-6B, has 6 billion parameters and is designed to run on consumer hardware rather than requiring large data center infrastructure. The model goes beyond simple back-and-forth chat. It natively supports tool calling (where the model can invoke external functions or APIs on your behalf), code execution through a built-in interpreter, and multi-step agent tasks where it reasons through a problem in stages. The README describes a revised prompt format that makes these capabilities work without extra configuration. Four variants are available for download. The standard ChatGLM3-6B handles context windows up to 8,000 tokens, which is enough for most conversations. ChatGLM3-6B-32K extends that to 32,000 tokens, which helps with longer documents, and ChatGLM3-6B-128K pushes further still for very long-form reading tasks. A separate base model (without the chat fine-tuning) is also released for researchers who want to build on top of it. Benchmark scores show this family substantially outperformed comparably sized models on math, reasoning, and coding tests at the time of release. Long-document tasks showed average gains of over 50 percent compared to the previous generation. The weights are free to use for academic research. Commercial use is allowed after filling out a registration form. The README notes that the newer GLM-4 series has since been released and improves further on these results, so users who need the best current performance are pointed toward that newer family. To get started you clone the repository, install the Python dependencies, and then download the model weights from HuggingFace or ModelScope. A combined demo lets you switch between chat mode, tool-use mode, and code-interpreter mode in one interface. Third-party projects for faster inference on laptops, TPUs, and NVIDIA GPUs are also listed in the README.
← zai-org on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.