Analysis updated 2026-05-18
Train large language models using reinforcement learning algorithms like GRPO and PPO on multi-GPU clusters.
Integrate RL training with existing infrastructure like vLLM for text generation and Megatron-LM for model parallelism.
Reduce memory usage and communication overhead when switching between model training and inference phases.
Fine-tune HuggingFace models with reinforcement learning at scale up to hundreds of billions of parameters.
| verl-project/verl | qwenlm/qwen | huggingface/peft | |
|---|---|---|---|
| Stars | 21,107 | 21,109 | 21,070 |
| Language | Python | Python | Python |
| Setup difficulty | hard | moderate | moderate |
| Complexity | 4/5 | 4/5 | 3/5 |
| Audience | researcher | developer | researcher |
Figures from each repo's GitHub metadata at analysis time.
Requires GPU/CUDA, multiple distributed training frameworks (vLLM, Megatron-LM, FSDP), and careful environment configuration.
verl is an open-source library for reinforcement-learning (RL) post-training of large language models. It was started by the ByteDance Seed team and is now maintained by a wider community under the verl-project organization. The README describes it as the open-source version of HybridFlow, a research paper on a flexible and efficient framework for RLHF (reinforcement learning from human feedback). The point of the library is to take a pretrained or instruction-tuned LLM and run RL algorithms on top of it, the kind of training step that produces models like DeepSeek-R1 or that turns a base model into one that can reason better. The library is aimed at people who already work with LLM training infrastructure. The README highlights an easy way to add new RL algorithms (such as GRPO and PPO) using a hybrid-controller programming model, integration with existing training and inference frameworks including FSDP, Megatron-LM, vLLM, and SGLang, flexible mapping of different model roles onto different GPUs, and ready integration with Hugging Face models. For performance, it advertises state-of-the-art training and generation throughput, and a 3D-HybridEngine that reshards the actor model between the training and generation phases without redundant memory copies and with reduced communication overhead. The news section is long and worth scanning. verl has been used to train notable systems including DAPO (a reported state-of-the-art RL algorithm reaching 50 points on AIME 2024 starting from Qwen2.5-32B), ByteDance's Seed-Thinking-v1.5 reasoning model, and VAPO (a value-based PPO variant). The Megatron backend has been used to run RL on very large mixture-of-experts models such as DeepSeek-671B and Qwen3-235B, and there is a reported case of GRPO LoRA training a trillion-parameter model on 64 H800 GPUs. verl has been presented at PyTorch Conference 2025, PyTorch Conference Europe 2026, NVIDIA GTC26, and ICLR 2025. The recipe directory (containing reproduction code for things like DAPO and ReTool) lives in a separate verl-recipe repository added as a git submodule, while experimental modules such as transfer_queue and fully_async_policy still live under verl/experimental. The README links to full documentation on Read the Docs, a Slack workspace, a Twitter account, a WeChat group, and the HybridFlow paper on arXiv.
Python library for training large language models with reinforcement learning at scale, integrating with existing AI infrastructure like vLLM and Megatron-LM.
Mainly Python. The stack also includes Python, PyTorch, vLLM.
Use freely for any purpose including commercial. Keep the notice and disclose changes to the patent grant.
Setup difficulty is rated hard, with roughly 1day+ to a first successful run.
Mainly researcher.
This repo across BitVibe Labs
Verify against the repo before relying on details.