explaingit

openpipe/art

9,454PythonAudience · developerComplexity · 4/5Setup · hard

TLDR

An open-source Python library that trains AI language models to improve at multi-step tasks, searching emails, playing games, writing queries, using reinforcement learning (GRPO). You define the task and a scoring function, ART handles the training loop.

Mindmap

mindmap
  root((repo))
    Training
      GRPO reinforcement
      Scoring function
      Automatic training loop
    Models
      Qwen
      Llama
      Open-source LLMs
    Infrastructure
      W&B cloud training
      Self-hosted GPU
      Inference service
    Examples
      Email search
      Game playing
      Tool calling
      LangGraph agents
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Train a language model to get better at calling tools in a specific workflow, like searching through emails

USE CASE 2

Use the W&B Training cloud option to fine-tune an AI agent on shared GPUs without setting up your own hardware

USE CASE 3

Teach an open-source model to play a strategy game through trial and error by writing a custom scoring function

USE CASE 4

Combine ART with LangGraph to improve a multi-step AI agent that uses external tools via reinforcement training

Tech stack

PythonpipLangGraphWeights and BiasesQwenLlama

Getting it running

Difficulty · hard Time to first run · 1h+

Requires GPU hardware or a Weights and Biases Training account, model training is computationally intensive and not feasible on a standard laptop.

No license information is provided in the explanation.

In plain English

ART stands for Agent Reinforcement Trainer, an open-source Python library from OpenPipe that teaches AI language models to improve at multi-step tasks by letting them practice and receive feedback. The underlying technique is called GRPO (Group Relative Policy Optimization), a form of reinforcement learning. The model tries different approaches, gets scored on how well it did, and gradually adjusts its behavior over time. Think of it like on-the-job training: the agent does a task, sees the result, and learns what to do differently next time. The library is aimed at developers who want to improve how their AI agents handle real-world tasks: searching emails, playing strategy games, writing database queries, summarizing documents, and more. It works with popular open-source models including Qwen, Llama, and others compatible with the same training format. You define the task, write a scoring function that tells the system when the agent did well or poorly, and ART handles the training loop. ART includes a serverless training option called W&B Training, provided through the Weights and Biases platform, which runs training on shared cloud infrastructure so you do not need to set up your own GPU servers. The README states this cuts costs by about 40% and speeds up training by about 28% compared to self-hosted setups. Each trained model checkpoint is immediately available through a companion inference service so you can test your improved model right away without extra configuration. For getting started, the project provides a set of runnable notebooks that walk through specific tasks step by step. Examples include teaching a model to play the game 2048, searching through emails using a tool-calling workflow, mastering Tic Tac Toe, and playing the word game Codenames. There are also examples that combine ART with LangGraph and the Model Context Protocol, which are frameworks for building AI applications that use external tools. The library installs via pip under the package name openpipe-art. Documentation is available on a dedicated site, and a Discord community is linked from the README for questions and support.

Copy-paste prompts

Prompt 1
I want to use openpipe-art to train a Qwen model to get better at searching emails with tool calls. Show me how to define the task environment and write a scoring function.
Prompt 2
Using ART's W&B Training option, how do I start a training run on shared cloud GPUs and then test my improved model through the companion inference service?
Prompt 3
Show me a minimal ART training script that teaches a language model to play Tic Tac Toe using GRPO reinforcement learning, including the scoring function that rewards wins.
Prompt 4
How do I combine ART with LangGraph to build a multi-step AI agent that uses external tools and then improve its tool-calling accuracy through reinforcement training?
Prompt 5
I want to fine-tune a Llama model with ART to get better at writing SQL queries. What does the scoring function look like and how do I define the task environment?
Open on GitHub → Explain another repo

← openpipe on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.