Take an open-source language model from Hugging Face and train it to follow instructions better using a custom Python scoring function.
Apply reinforcement learning from human feedback to a chatbot so it learns to give more helpful answers over time.
Run large-scale RLHF training across multiple GPUs using DeepSpeed or NVIDIA NeMo for models too large for a single machine.
Experiment with RLHF techniques in a Colab notebook before scaling up to a full training run.
Requires GPU hardware with a compatible CUDA environment, large model training also needs DeepSpeed or NVIDIA NeMo configuration.
trlX is a Python framework for taking a language model that has already been trained and making it better through a process called reinforcement learning from human feedback, or RLHF. The idea is that after a model learns to predict text, you can further adjust its behavior by giving it scores for the responses it produces. The model then learns to generate responses that score higher. This is the same general technique used to improve models like ChatGPT after their initial training phase. The framework accepts two kinds of guidance during training. You can supply a reward function, which is a piece of code that evaluates each generated response and returns a number. Or you can supply a dataset of example responses that already have scores attached. Either way, trlX runs the training loop and adjusts the model weights based on the feedback signal. Two reinforcement learning algorithms are available: Proximal Policy Optimization (PPO) and Implicit Language Q-Learning (ILQL). Both are described in academic papers that the README links to. On the model side, the framework works with models hosted on Hugging Face, including well-known open models like those from EleutherAI and Google. For models up to about 20 billion parameters, training is handled through a tool called Accelerate. For larger models, the framework integrates with NVIDIA NeMo, which uses additional parallelism techniques to spread the computation across many machines. Distributed training with multiple GPUs is supported through DeepSpeed and NeMo-Megatron. Hyperparameter search can be run using Ray Tune. Experiment logs and training curves can be tracked with Weights and Biases. The trained model can be saved in a format compatible with Hugging Face, making it easy to share or deploy. The project includes Colab notebooks for quick experimentation and was presented as an academic paper at EMNLP 2023. It was built by CarperAI as part of their work on open-source RLHF tooling.
← carperai on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.