verl-project/verl

Analysis updated 2026-05-18

★ 21,107PythonAudience · researcherComplexity · 4/5LicenseSetup · hard

Mindmap

mindmap
  root((verl))
    What it does
      RL training for LLMs
      Hybrid controller model
      Memory efficient
    Tech stack
      Python
      PyTorch
      vLLM integration
      Megatron-LM support
    Use cases
      Train billion-param models
      GRPO and PPO algorithms
      Multi-GPU clusters
    Key features
      3D-HybridEngine
      FSDP compatible
      HuggingFace models

mindmap root((verl)) What it does RL training for LLMs Hybrid controller model Memory efficient Tech stack Python PyTorch vLLM integration Megatron-LM support Use cases Train billion-param models GRPO and PPO algorithms Multi-GPU clusters Key features 3D-HybridEngine FSDP compatible HuggingFace models

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Train large language models using reinforcement learning algorithms like GRPO and PPO on multi-GPU clusters.

USE CASE 2

Integrate RL training with existing infrastructure like vLLM for text generation and Megatron-LM for model parallelism.

USE CASE 3

Reduce memory usage and communication overhead when switching between model training and inference phases.

USE CASE 4

Fine-tune HuggingFace models with reinforcement learning at scale up to hundreds of billions of parameters.

What is it built with?

PythonPyTorchvLLMMegatron-LMFSDPSGLangHuggingFace

How does it compare?

	verl-project/verl	qwenlm/qwen	huggingface/peft
Stars	21,107	21,109	21,070
Language	Python	Python	Python
Setup difficulty	hard	moderate	moderate
Complexity	4/5	4/5	3/5
Audience	researcher	developer	researcher

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

Requires GPU/CUDA, multiple distributed training frameworks (vLLM, Megatron-LM, FSDP), and careful environment configuration.

Use freely for any purpose including commercial. Keep the notice and disclose changes to the patent grant.

In plain English

verl is an open-source library for reinforcement-learning (RL) post-training of large language models. It was started by the ByteDance Seed team and is now maintained by a wider community under the verl-project organization. The README describes it as the open-source version of HybridFlow, a research paper on a flexible and efficient framework for RLHF (reinforcement learning from human feedback). The point of the library is to take a pretrained or instruction-tuned LLM and run RL algorithms on top of it, the kind of training step that produces models like DeepSeek-R1 or that turns a base model into one that can reason better. The library is aimed at people who already work with LLM training infrastructure. The README highlights an easy way to add new RL algorithms (such as GRPO and PPO) using a hybrid-controller programming model, integration with existing training and inference frameworks including FSDP, Megatron-LM, vLLM, and SGLang, flexible mapping of different model roles onto different GPUs, and ready integration with Hugging Face models. For performance, it advertises state-of-the-art training and generation throughput, and a 3D-HybridEngine that reshards the actor model between the training and generation phases without redundant memory copies and with reduced communication overhead. The news section is long and worth scanning. verl has been used to train notable systems including DAPO (a reported state-of-the-art RL algorithm reaching 50 points on AIME 2024 starting from Qwen2.5-32B), ByteDance's Seed-Thinking-v1.5 reasoning model, and VAPO (a value-based PPO variant). The Megatron backend has been used to run RL on very large mixture-of-experts models such as DeepSeek-671B and Qwen3-235B, and there is a reported case of GRPO LoRA training a trillion-parameter model on 64 H800 GPUs. verl has been presented at PyTorch Conference 2025, PyTorch Conference Europe 2026, NVIDIA GTC26, and ICLR 2025. The recipe directory (containing reproduction code for things like DAPO and ReTool) lives in a separate verl-recipe repository added as a git submodule, while experimental modules such as transfer_queue and fully_async_policy still live under verl/experimental. The README links to full documentation on Read the Docs, a Slack workspace, a Twitter account, a WeChat group, and the HybridFlow paper on arXiv.

Copy-paste prompts

Prompt 1

How do I set up verl to train a language model with PPO on multiple GPUs using my existing vLLM infrastructure?

Prompt 2

Show me how to use verl's 3D-HybridEngine to reduce memory overhead when alternating between training and generation.

Prompt 3

What are the key differences between GRPO and PPO in verl, and when should I use each one?

Prompt 4

How do I integrate a HuggingFace model into verl and start reinforcement learning training?

Prompt 5

Explain the hybrid-controller programming model in verl and how it separates computation from data dependencies.

Frequently asked questions

What is verl?

Python library for training large language models with reinforcement learning at scale, integrating with existing AI infrastructure like vLLM and Megatron-LM.

What language is verl written in?

Mainly Python. The stack also includes Python, PyTorch, vLLM.

What license does verl use?

Use freely for any purpose including commercial. Keep the notice and disclose changes to the patent grant.

How hard is verl to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is verl for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub verl-project on gitmyhub

Verify against the repo before relying on details.