Improve an existing AI agent built with LangGraph or OpenAI's SDK without rewriting it, by plugging in rLLM to train it through trial and feedback
Run standard AI benchmarks from the command line to evaluate how well a language model performs on tasks like math or finance
Train a small model to outperform much larger models on a specific domain by fine-tuning it with reinforcement learning on domain tasks
Works with existing agent frameworks with minimal code changes. Single-machine tinker backend runs on CPU, multi-GPU verl backend needed for large-scale training.
rLLM is an open-source Python framework for training AI agents using reinforcement learning. The idea is that you already have an AI agent built with whatever tools you use, and rLLM plugs in around it to improve the agent's behavior over time through trial and feedback, without requiring you to rewrite the agent from scratch. The central concept is straightforward: your agent runs on a task, rLLM records every call the agent makes to a language model, you define a function that scores how well the agent did, and the framework uses that score to update the model's weights so it performs better on similar tasks in the future. This cycle of run, score, and update is what reinforcement learning means in this context. rLLM works with a wide range of existing agent frameworks including LangGraph, OpenAI's Agents SDK, Google's ADK, and others. Adding it to an existing project typically requires only a small change: swapping in a tracked client and adding a decorator to the function that runs your agent. The framework then handles tracing automatically. For running training at scale, rLLM supports two backends. One called verl is designed for machines with multiple GPUs and handles distributed training. The other called tinker runs on a single machine and also works on CPU, making it accessible without specialized hardware. The framework includes a command-line interface with over 50 built-in benchmarks for evaluation and training. A few lines like rllm eval gsm8k or rllm train gsm8k run the full pipeline. The README cites results showing that models trained with rLLM can outperform much larger models on specific tasks, including a 4-billion parameter model beating a 235-billion parameter model on finance tasks. The full README is longer than what was shown.
← rllm-org on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.