Fine-tune a LLaMA, Qwen, or DeepSeek model on your own dataset using PaddleNLP's efficient training pipeline.
Run quantized DeepSeek-R1 inference at over 2,100 tokens per second on a single machine.
Train and deploy models on Chinese-made AI chips like Kunlun XPU or Ascend NPU using the same code as Nvidia GPUs.
Merge weights from multiple fine-tuned model versions into a single model using the built-in MergeKit tool.
Requires an Nvidia GPU or supported Chinese AI chip, README is primarily in Chinese.
PaddleNLP is a Python library for building, training, and running large language models. It comes from Baidu's PaddlePaddle AI team and is designed to make working with modern AI language models practical for real-world applications. The readme is written primarily in Chinese, reflecting its origin and primary user community. The library covers the full workflow: pre-training a model from scratch, fine-tuning an existing model on your own data, compressing a model so it runs faster or on smaller hardware, and deploying it for production use. It supports popular open model families including LLaMA, Qwen, DeepSeek, Mistral, Baichuan, ChatGLM, Gemma, and others. Recent updates added support for Qwen3 and DeepSeek-R1, including quantized inference that can reach over 2,100 tokens per second on a single machine. One notable feature is multi-hardware support: the library works across Nvidia GPUs as well as several Chinese-made chips (Kunlun XPU, Ascend NPU, Hygon DCU, and others), with a consistent interface that lets you switch hardware without rewriting your code. This is particularly relevant for teams in China who may not have access to or want to depend on Nvidia hardware. For fine-tuning, it includes an efficient training pipeline with FlashMask, a custom attention operator that reduces wasted computation on padded sequences. Checkpoints can be saved and restored quickly, with a compression feature that cuts storage space by about 78 percent. There is also a model merging tool called MergeKit to combine weights from multiple fine-tuned versions. The full README is longer than what was shown.
← paddlepaddle on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.