paddlepaddle/paddlenlp

★ 12,941PythonAudience · researcherComplexity · 5/5Setup · hard

Mindmap

mindmap
  root((PaddleNLP))
    What it does
      LLM training
      Fine-tuning
      Model deployment
    Supported Models
      LLaMA Qwen
      DeepSeek Mistral
    Tech Stack
      Python
      PaddlePaddle
      GPU and NPU
    Audience
      AI researchers
      ML engineers

mindmap root((PaddleNLP)) What it does LLM training Fine-tuning Model deployment Supported Models LLaMA Qwen DeepSeek Mistral Tech Stack Python PaddlePaddle GPU and NPU Audience AI researchers ML engineers

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Fine-tune a LLaMA, Qwen, or DeepSeek model on your own dataset using PaddleNLP's efficient training pipeline.

USE CASE 2

Run quantized DeepSeek-R1 inference at over 2,100 tokens per second on a single machine.

USE CASE 3

Train and deploy models on Chinese-made AI chips like Kunlun XPU or Ascend NPU using the same code as Nvidia GPUs.

USE CASE 4

Merge weights from multiple fine-tuned model versions into a single model using the built-in MergeKit tool.

Tech stack

PythonPaddlePaddleCUDANvidia GPUAscend NPUKunlun XPU

Getting it running

Difficulty · hard Time to first run · 1day+

Requires an Nvidia GPU or supported Chinese AI chip, README is primarily in Chinese.

In plain English

PaddleNLP is a Python library for building, training, and running large language models. It comes from Baidu's PaddlePaddle AI team and is designed to make working with modern AI language models practical for real-world applications. The readme is written primarily in Chinese, reflecting its origin and primary user community. The library covers the full workflow: pre-training a model from scratch, fine-tuning an existing model on your own data, compressing a model so it runs faster or on smaller hardware, and deploying it for production use. It supports popular open model families including LLaMA, Qwen, DeepSeek, Mistral, Baichuan, ChatGLM, Gemma, and others. Recent updates added support for Qwen3 and DeepSeek-R1, including quantized inference that can reach over 2,100 tokens per second on a single machine. One notable feature is multi-hardware support: the library works across Nvidia GPUs as well as several Chinese-made chips (Kunlun XPU, Ascend NPU, Hygon DCU, and others), with a consistent interface that lets you switch hardware without rewriting your code. This is particularly relevant for teams in China who may not have access to or want to depend on Nvidia hardware. For fine-tuning, it includes an efficient training pipeline with FlashMask, a custom attention operator that reduces wasted computation on padded sequences. Checkpoints can be saved and restored quickly, with a compression feature that cuts storage space by about 78 percent. There is also a model merging tool called MergeKit to combine weights from multiple fine-tuned versions. The full README is longer than what was shown.

Copy-paste prompts

Prompt 1

Using PaddleNLP, show me the Python code to fine-tune a LLaMA model on a custom text classification dataset with efficient training enabled.

Prompt 2

I want to run DeepSeek-R1 with quantized inference using PaddleNLP. What are the setup steps and the Python code to load and run the model?

Prompt 3

Using PaddleNLP's MergeKit, show me how to merge weights from two fine-tuned Qwen models into a single combined model.

Prompt 4

I fine-tuned a model with PaddleNLP and the checkpoint is too large. Show me how to use the compression feature that reduces storage by 78%.

Open on GitHub → Explain another repo

← paddlepaddle on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.