explaingit

carperai/trlx

4,747PythonAudience · researcherComplexity · 5/5Setup · hard

TLDR

A Python framework for improving AI language models by training them with feedback scores, using the same technique, reinforcement learning from human feedback, that was used to refine ChatGPT after its initial training.

Mindmap

mindmap
  root((trlX))
    What it does
      Fine-tune language models
      Feedback-based training
      Reward signal learning
    Algorithms
      PPO
      ILQL
    Integrations
      Hugging Face models
      DeepSpeed
      NVIDIA NeMo
      Ray Tune
    Scale
      Single GPU
      Multi-GPU distributed
      Very large models
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Take an open-source language model from Hugging Face and train it to follow instructions better using a custom Python scoring function.

USE CASE 2

Apply reinforcement learning from human feedback to a chatbot so it learns to give more helpful answers over time.

USE CASE 3

Run large-scale RLHF training across multiple GPUs using DeepSpeed or NVIDIA NeMo for models too large for a single machine.

USE CASE 4

Experiment with RLHF techniques in a Colab notebook before scaling up to a full training run.

Tech stack

PythonPyTorchHugging FaceDeepSpeedNVIDIA NeMoRay

Getting it running

Difficulty · hard Time to first run · 1day+

Requires GPU hardware with a compatible CUDA environment, large model training also needs DeepSpeed or NVIDIA NeMo configuration.

In plain English

trlX is a Python framework for taking a language model that has already been trained and making it better through a process called reinforcement learning from human feedback, or RLHF. The idea is that after a model learns to predict text, you can further adjust its behavior by giving it scores for the responses it produces. The model then learns to generate responses that score higher. This is the same general technique used to improve models like ChatGPT after their initial training phase. The framework accepts two kinds of guidance during training. You can supply a reward function, which is a piece of code that evaluates each generated response and returns a number. Or you can supply a dataset of example responses that already have scores attached. Either way, trlX runs the training loop and adjusts the model weights based on the feedback signal. Two reinforcement learning algorithms are available: Proximal Policy Optimization (PPO) and Implicit Language Q-Learning (ILQL). Both are described in academic papers that the README links to. On the model side, the framework works with models hosted on Hugging Face, including well-known open models like those from EleutherAI and Google. For models up to about 20 billion parameters, training is handled through a tool called Accelerate. For larger models, the framework integrates with NVIDIA NeMo, which uses additional parallelism techniques to spread the computation across many machines. Distributed training with multiple GPUs is supported through DeepSpeed and NeMo-Megatron. Hyperparameter search can be run using Ray Tune. Experiment logs and training curves can be tracked with Weights and Biases. The trained model can be saved in a format compatible with Hugging Face, making it easy to share or deploy. The project includes Colab notebooks for quick experimentation and was presented as an academic paper at EMNLP 2023. It was built by CarperAI as part of their work on open-source RLHF tooling.

Copy-paste prompts

Prompt 1
Using trlX with the PPO algorithm, show me how to fine-tune a Hugging Face language model using a custom reward function that scores responses by helpfulness and conciseness.
Prompt 2
I have a dataset of prompt, response, and score triples. Show me how to load it into trlX and run an ILQL training run to fine-tune a language model on that feedback.
Prompt 3
How do I configure trlX to use DeepSpeed for multi-GPU training of a 7 billion parameter language model across 4 GPUs?
Prompt 4
Show me how to track a trlX training run with Weights and Biases and save the final model in Hugging Face format so I can deploy or share it.
Open on GitHub → Explain another repo

← carperai on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.