paddlepaddle/paddleformers

★ 12,984PythonAudience · researcherComplexity · 5/5Setup · hard

Mindmap

mindmap
  root((paddleformers))
    Supported Models
      DeepSeek V3
      Llama 3
      Qwen series
    Training
      Tensor parallelism
      Pipeline parallelism
      LoRA fine-tuning
    Deployment
      vLLM export
      SGLang export
    Infrastructure
      CUDA GPUs
      Docker install

mindmap root((paddleformers)) Supported Models DeepSeek V3 Llama 3 Qwen series Training Tensor parallelism Pipeline parallelism LoRA fine-tuning Deployment vLLM export SGLang export Infrastructure CUDA GPUs Docker install

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Fine-tune a large language model like Llama-3 or Qwen2 on your own dataset using LoRA with far fewer GPU resources than full retraining.

USE CASE 2

Train large AI models across many GPUs using tensor and pipeline parallelism for faster throughput.

USE CASE 3

Export a PaddleFormers-trained model to a format compatible with vLLM or SGLang for production deployment.

Tech stack

PythonPaddlePaddleCUDADockerLoRA

Getting it running

Difficulty · hard Time to first run · 1day+

Requires CUDA-enabled GPUs and Python 3.10+, install via Docker image or pip, with significant GPU memory needed for large model training.

In plain English

PaddleFormers is a library for training large AI models, built on top of PaddlePaddle, which is Baidu's deep learning framework. It provides a model library and training toolkit similar in purpose to the widely known Hugging Face Transformers, but optimized for PaddlePaddle's ecosystem and for high-performance distributed training across many GPUs. The library supports over 100 models, covering both large language models (models that process and generate text) and vision-language models (models that handle both images and text together). Supported model families include DeepSeek-V3, Qwen2 and Qwen3, Llama-3, GLM-4.5, Baidu's own ERNIE-4.5 series, and several others. The README is written primarily in Chinese. The main technical focus is training efficiency. The library implements strategies for spreading training across many machines at once, including tensor parallelism, pipeline parallelism, and expert parallelism. It also uses lower-precision arithmetic and other optimizations to reduce memory and compute usage during training. According to the README, training speed for key models such as DeepSeek-V3 and GLM-4.5-Air exceeds that of Megatron-LM, which is a commonly used benchmark for large-scale training performance. Beyond initial training from scratch, the library supports the full workflow including fine-tuning with various techniques such as LoRA (a method for adapting a model with far fewer parameters than full retraining) and alignment training methods. Models trained with PaddleFormers can be saved in a format compatible with other tools like vLLM and SGLang, so they can be deployed outside of PaddlePaddle. Installation is via Docker image or pip, and requires Python 3.10 or later along with CUDA-enabled GPUs for training.

Copy-paste prompts

Prompt 1

How do I fine-tune Llama-3 using LoRA with PaddleFormers? Show me the training configuration and launch command.

Prompt 2

Set up distributed training across 8 GPUs in PaddleFormers using tensor parallelism, what config flags do I set?

Prompt 3

How do I install PaddleFormers via Docker and run a basic model training job to verify the setup works?

Prompt 4

How do I export a model trained with PaddleFormers so I can serve it with vLLM for inference?

Prompt 5

What is the difference between tensor parallelism and pipeline parallelism in PaddleFormers, and when should I use each?

Open on GitHub → Explain another repo

← paddlepaddle on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.