Deploy a bilingual Chinese-English chatbot on a single consumer GPU with as little as 6GB VRAM
Fine-tune the model on your own dataset for a domain-specific assistant or research task
Run a local AI chat assistant with 32K context for long documents using the ChatGLM2-6B-32K variant
Use the model as a research base for studying bilingual language understanding and alignment
Requires a CUDA GPU (at least 6GB VRAM for INT4 quantization) and pip install of PyTorch and Transformers.
ChatGLM2-6B is the second-generation version of ChatGLM-6B, an open-source bilingual (Chinese and English) conversational large language model. "6B" refers to its size, roughly six billion parameters, which is small enough to run on a single consumer GPU while still being capable enough for general chat. The repository contains the model code and supporting scripts you need to download weights, run inference, and fine-tune the model on your own data. Compared with the first generation, ChatGLM2-6B was upgraded across several axes. The base model was retrained on 1.4T Chinese and English tokens with the GLM mixed-objective function and aligned to human preferences, producing large jumps on benchmarks like MMLU, C-Eval, GSM8K, and BBH (the README quotes gains such as +23% on MMLU and +571% on GSM8K). The context length was extended from 2K to 32K tokens using FlashAttention, with an 8K window used during chat training and a separate ChatGLM2-6B-32K variant for longer documents. Inference was made more efficient through Multi-Query Attention: roughly 42% faster generation than the first generation, and a 6GB GPU running INT4 quantization can sustain conversations up to 8K characters. INT8 and INT4 quantization further reduce memory with only modest accuracy loss. You would use ChatGLM2-6B if you want a freely available chatbot model that is strong in both Chinese and English, can run on a single GPU, and can be fine-tuned locally, for research, prototyping, or, after registering through a form, free commercial use. It is built in Python on PyTorch and Hugging Face Transformers, installed with pip after cloning. The full README is longer than what was provided.
← zai-org on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.