pku-alignment/align-anything

★ 4,650PythonAudience · researcherComplexity · 5/5Setup · hard

Mindmap

mindmap
  root((align-anything))
    What it does
      Alignment training
      Multimodal support
    Training methods
      SFT
      DPO
      PPO and RLHF
    Supported models
      Llama
      Qwen3
    Infrastructure
      Nvidia GPU
      Huawei Ascend
      Slurm cluster

mindmap root((align-anything)) What it does Alignment training Multimodal support Training methods SFT DPO PPO and RLHF Supported models Llama Qwen3 Infrastructure Nvidia GPU Huawei Ascend Slurm cluster

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Run RLHF or DPO training to align a multimodal AI model with human feedback

USE CASE 2

Fine-tune Llama or Qwen3 models using supervised fine-tuning on your own examples

USE CASE 3

Run alignment training experiments on a Slurm cluster at a university or research lab

USE CASE 4

Use as a course homework platform for studying large language model alignment

Tech stack

PythonPyTorchCUDASlurmHuawei Ascend

Getting it running

Difficulty · hard Time to first run · 1day+

Requires Nvidia or Huawei Ascend GPU hardware and a Slurm cluster or equivalent compute environment.

In plain English

Align-Anything is a research framework from the Peking University Alignment Team for training AI models to better follow human intentions and values. The core problem it addresses is that large AI models, especially ones that handle images, video, or audio in addition to text, often behave in ways their creators did not intend. This project provides tools to run the training procedures that push a model's behavior closer to what humans actually want. The framework supports several training methods that researchers use for this kind of correction. SFT (supervised fine-tuning) trains a model on examples of correct behavior. DPO (direct preference optimization) and PPO (proximal policy optimization) are techniques that use human or automated feedback to adjust how the model responds. RLHF, which stands for reinforcement learning from human feedback, is the broader category these methods belong to. The project also supports GRPO, a method associated with the DeepSeek R1 model. What makes this project notable compared to similar tools is that it works across many input and output types at once. Most alignment frameworks focus on text. This one is designed to handle models that take in text, images, video, or audio, and produce outputs in those same formats. The README describes support for a range of publicly available model families including Qwen3, Llama, and various multimodal models from other research groups. The project is used as the homework platform for a Peking University course on large language models. It supports training on both Nvidia GPUs and Huawei Ascend processors, and it can run on Slurm clusters, which are the shared computing systems common in academic and research settings. This is a tool aimed at AI researchers and engineers who want to run alignment training on their own models. It is not an end-user product. The documentation includes notebooks with step-by-step tutorials for common training scenarios.

Copy-paste prompts

Prompt 1

Using align-anything, write a training script that fine-tunes a Llama model with DPO using a custom human preference dataset.

Prompt 2

How do I configure align-anything to run PPO training on a multimodal model that accepts both text and images?

Prompt 3

Set up an align-anything training job on a Slurm cluster with Huawei Ascend GPUs, what config files do I need?

Prompt 4

Show me how to add the GRPO training method to an existing model in align-anything, following the framework's patterns.

Prompt 5

Generate a step-by-step tutorial notebook for running supervised fine-tuning with align-anything on a small dataset.

Open on GitHub → Explain another repo

← pku-alignment on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.