modelscope/ms-swift

★ 14,104PythonAudience · researcherComplexity · 5/5Setup · hard

Mindmap

mindmap
  root((ms-swift))
    What it does
      LLM fine-tuning
      Full training pipeline
      Eval and deployment
    Models supported
      600+ text models
      400+ multimodal models
      Qwen Llama DeepSeek
    Training techniques
      LoRA and QLoRA
      Flash Attention
      Multi-GPU parallel
      DPO and GRPO
    Deployment
      vLLM inference
      SGLang backend
      OpenAI-compatible API
    Hardware
      Consumer GPUs
      A100 and H100
      Apple MPS
      Ascend NPU

mindmap root((ms-swift)) What it does LLM fine-tuning Full training pipeline Eval and deployment Models supported 600+ text models 400+ multimodal models Qwen Llama DeepSeek Training techniques LoRA and QLoRA Flash Attention Multi-GPU parallel DPO and GRPO Deployment vLLM inference SGLang backend OpenAI-compatible API Hardware Consumer GPUs A100 and H100 Apple MPS Ascend NPU

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Fine-tune a 7-billion-parameter language model on your own dataset using a single consumer GPU with 9 GB of memory.

USE CASE 2

Train a multimodal model that understands images, video, and audio alongside text using LoRA or QLoRA.

USE CASE 3

Deploy a fine-tuned model behind an OpenAI-compatible API endpoint using vLLM or SGLang for fast inference.

USE CASE 4

Apply DPO or GRPO preference learning to improve a model's reasoning and alignment with human feedback.

Tech stack

PythonPyTorchLoRAvLLMCUDA

Getting it running

Difficulty · hard Time to first run · 1day+

Requires a GPU with CUDA, downloading model weights and configuring the training environment adds significant prep time.

In plain English

ms-swift is a Python framework from the ModelScope community that makes it easier to train and fine-tune large AI language models. Fine-tuning means taking an existing model that was already trained on large amounts of text and then training it further on your own data so it performs better for a specific task. The project covers the full pipeline from training through evaluation, quantization, and deployment so you can take a model from raw weights to a running API endpoint. The library supports over 600 text-only models and over 400 multimodal models (ones that can handle images, video, and audio alongside text). It works with well-known model families such as Qwen3, Llama4, DeepSeek-R1, InternLM3, GLM4.5, and many others. The hardware requirements are flexible: you can run it on consumer GPUs like RTX cards, datacenter GPUs like A100 and H100, Apple MPS, CPU-only machines, and Ascend NPUs. Training large models normally requires enormous amounts of GPU memory. ms-swift addresses this through several techniques. It supports lightweight fine-tuning methods such as LoRA and QLoRA that only update a small fraction of a model weights, keeping memory use low enough that a 7-billion-parameter model can be trained on as little as 9 GB of GPU memory. It also includes options such as Flash Attention, gradient checkpointing, and parallel training strategies that split work across multiple GPUs or machines. Beyond standard instruction fine-tuning, the framework supports reinforcement learning alignment, preference learning methods such as DPO and KTO, embedding and reranker training, and several GRPO-family algorithms used to improve model reasoning. Once a model is trained, ms-swift can quantize it to a smaller size and deploy it behind an OpenAI-compatible API endpoint using vLLM, SGLang, or LmDeploy for faster inference. A web interface is included if you prefer not to work on the command line. The project was accepted at AAAI 2025 and has an associated academic paper. The full README is longer than what was shown.

Copy-paste prompts

Prompt 1

Using ms-swift, write the command to fine-tune Qwen3-7B on a custom instruction dataset in JSONL format using LoRA on a single RTX 3090.

Prompt 2

How do I deploy a fine-tuned model with ms-swift using vLLM to serve an OpenAI-compatible API endpoint on port 8000?

Prompt 3

Show me how to run DPO preference learning on a Llama model using ms-swift with a paired preference dataset in JSON format.

Prompt 4

How do I enable Flash Attention and gradient checkpointing in ms-swift to reduce GPU memory usage during a fine-tuning run?

Prompt 5

What quantization formats does ms-swift support for compressing a fine-tuned model before deployment, and which one should I use for fastest inference?

Open on GitHub → Explain another repo

← modelscope on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.