Run RLHF or DPO training to align a multimodal AI model with human feedback
Fine-tune Llama or Qwen3 models using supervised fine-tuning on your own examples
Run alignment training experiments on a Slurm cluster at a university or research lab
Use as a course homework platform for studying large language model alignment
Requires Nvidia or Huawei Ascend GPU hardware and a Slurm cluster or equivalent compute environment.
Align-Anything is a research framework from the Peking University Alignment Team for training AI models to better follow human intentions and values. The core problem it addresses is that large AI models, especially ones that handle images, video, or audio in addition to text, often behave in ways their creators did not intend. This project provides tools to run the training procedures that push a model's behavior closer to what humans actually want. The framework supports several training methods that researchers use for this kind of correction. SFT (supervised fine-tuning) trains a model on examples of correct behavior. DPO (direct preference optimization) and PPO (proximal policy optimization) are techniques that use human or automated feedback to adjust how the model responds. RLHF, which stands for reinforcement learning from human feedback, is the broader category these methods belong to. The project also supports GRPO, a method associated with the DeepSeek R1 model. What makes this project notable compared to similar tools is that it works across many input and output types at once. Most alignment frameworks focus on text. This one is designed to handle models that take in text, images, video, or audio, and produce outputs in those same formats. The README describes support for a range of publicly available model families including Qwen3, Llama, and various multimodal models from other research groups. The project is used as the homework platform for a Peking University course on large language models. It supports training on both Nvidia GPUs and Huawei Ascend processors, and it can run on Slurm clusters, which are the shared computing systems common in academic and research settings. This is a tool aimed at AI researchers and engineers who want to run alignment training on their own models. It is not an end-user product. The documentation includes notebooks with step-by-step tutorials for common training scenarios.
← pku-alignment on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.