thudm/slime

★ 5,670PythonAudience · researcherComplexity · 5/5Setup · hard

Mindmap

mindmap
  root((slime))
    What it does
      Post-train LLMs
      Reinforcement learning
      Custom reward signals
    Supported models
      GLM family
      Qwen models
      DeepSeek V3
      Llama 3
    Core components
      Megatron training
      SGLang inference
      Data buffer
    Community use
      Physics reasoning
      GPU kernel gen
      Multimodal agents

mindmap root((slime)) What it does Post-train LLMs Reinforcement learning Custom reward signals Supported models GLM family Qwen models DeepSeek V3 Llama 3 Core components Megatron training SGLang inference Data buffer Community use Physics reasoning GPU kernel gen Multimodal agents

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Fine-tune a large language model like Llama 3 or Qwen using reinforcement learning with custom reward signals

USE CASE 2

Build a data generation pipeline for RL-based post-training without rewriting the core training infrastructure

USE CASE 3

Reproduce or extend the training methodology used for Tsinghua's GLM model series

Tech stack

PythonPyTorchMegatronSGLangCUDA

Getting it running

Difficulty · hard Time to first run · 1day+

Requires a multi-GPU cluster with CUDA and separate configuration of Megatron for training and SGLang for inference.

The markdown does not specify the license for this project.

In plain English

slime is a Python framework for post-training large language models using reinforcement learning. Post-training refers to the step that happens after an AI model has been initially trained: you take that model and further improve its behavior using feedback signals, often to make it better at following instructions or reasoning through problems. Reinforcement learning, in this context, means the model is rewarded for producing good outputs and learns to do more of what works. The framework comes from Tsinghua University and has powered several generations of the GLM model family, including GLM-5.1, GLM-5, and earlier versions. It also supports training Qwen models, DeepSeek V3 models, and Llama 3. slime connects two underlying systems to do its work. The training side uses Megatron, a library for efficiently training large models across many GPUs. The inference side uses SGLang, a fast serving engine that generates text at scale. Between them sits a data buffer that manages what prompts and generated examples flow into training. This separation means the system can generate new training data and run model updates at the same time, which is more efficient than doing them sequentially. The framework also provides flexible interfaces for custom data generation workflows, so researchers can define their own reward signals or data pipelines without rewriting the core infrastructure. Several external projects have been built on top of slime, ranging from physics reasoning models trained entirely through reinforcement learning, to tools for generating optimized GPU kernels, to multi-modal agent training systems. The README links to each of these as examples of what the framework can support. Documentation and a quick start guide are available in the repository, and contributions are welcome.

Copy-paste prompts

Prompt 1

Show me how to set up a slime training job for Qwen-7B with a custom reward function targeting math reasoning tasks

Prompt 2

How do I define a custom data generation workflow in slime to produce training examples for a specific domain?

Prompt 3

What hardware configuration does slime require and how do I wire Megatron and SGLang together for RL training?

Prompt 4

Walk me through the slime quickstart for running reinforcement learning post-training on a small language model

Prompt 5

How does slime's data buffer work and why does separating generation from training improve efficiency?

Open on GitHub → Explain another repo

← thudm on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.