microsoft/lora

★ 13,517PythonAudience · researcherComplexity · 4/5Setup · moderate

Mindmap

mindmap
  root((repo))
    What it does
      Efficient fine-tuning
      Freeze base model
      Small adapter layers
    How it works
      Low-rank matrices
      No inference slowdown
      Swap per task
    Tech stack
      Python PyTorch
      Hugging Face
    Limitations
      PyTorch only
      Research codebase
      Use PEFT in practice

mindmap root((repo)) What it does Efficient fine-tuning Freeze base model Small adapter layers How it works Low-rank matrices No inference slowdown Swap per task Tech stack Python PyTorch Hugging Face Limitations PyTorch only Research codebase Use PEFT in practice

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Fine-tune a large language model on your own dataset using a fraction of the compute that full retraining would require.

USE CASE 2

Train separate LoRA adapters for multiple tasks and swap them onto the same base model at inference time with no speed penalty.

USE CASE 3

Integrate LoRA layers into an existing PyTorch model to add parameter-efficient fine-tuning capability.

Tech stack

PythonPyTorchHugging Face

Getting it running

Difficulty · moderate Time to first run · 1h+

Requires PyTorch and transformer model familiarity, most practitioners now use the Hugging Face PEFT library instead of this original research repo.

In plain English

LoRA (Low-Rank Adaptation) is a technique for adapting large pre-trained AI language models to specific tasks without retraining the entire model. Training a large language model from scratch or fully retraining it for a new task requires enormous computing resources and produces a large set of updated weights that must be stored separately for each task. LoRA solves this by freezing the original model weights and inserting small trainable components that learn the task-specific adjustments. These added components are much smaller than the original model, so they are faster to train and cheaper to store. The practical benefit is that you can adapt the same base model to many different tasks by training and keeping only a small set of LoRA weights for each task, then swapping them in at inference time. This does not slow down the model when it is running (no inference latency is added), which is an advantage over some other adaptation approaches. The repository contains a Python package called loralib that implements the LoRA layers, along with examples showing how to integrate it into existing models built with PyTorch, including popular models available through Hugging Face. The paper accompanying this code showed results matching or exceeding full fine-tuning on standard language benchmarks while training fewer than one percent of the parameters that full fine-tuning would require. Note that this is the original research repository from Microsoft. The technique has since been incorporated into the Hugging Face PEFT (Parameter-Efficient Fine-Tuning) library, which is where most practitioners now access it. Only PyTorch is supported in this repository.

Copy-paste prompts

Prompt 1

How do I use loralib to apply LoRA to a Hugging Face transformer and fine-tune it on my custom text dataset?

Prompt 2

Write Python code with loralib to replace linear layers in a PyTorch model with LoRA layers and set the rank.

Prompt 3

What values should I use for LoRA rank and alpha when fine-tuning a language model for text classification?

Prompt 4

How do I save only the LoRA adapter weights and reload them onto the base model later in PyTorch?

Open on GitHub → Explain another repo

← microsoft on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.