explaingit

arcee-ai/mergekit

7,077PythonAudience · researcherComplexity · 4/5LicenseSetup · moderate

TLDR

A Python toolkit for combining multiple AI language models into one by mathematically blending their weights, describe the merge in a YAML config, run it on a CPU if needed, and get a single merged model out.

Mindmap

mindmap
  root((mergekit))
    What it does
      Merge LLM weights
      YAML configuration
      No extra training
    Merge Methods
      Weighted average
      Frankenmerging
      Evolutionary search
    Hardware
      CPU supported
      Low GPU memory
      Streaming loading
    Use Cases
      Combine model skills
      Custom model builds
      Upload to HuggingFace
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Blend two fine-tuned language models into one model that inherits capabilities from both.

USE CASE 2

Build a Frankenmerge model by taking specific layers from different base models in a custom order.

USE CASE 3

Use evolutionary merging to automatically search for the best combination of model weights for a target task.

USE CASE 4

Transplant the tokenizer and vocabulary from one language model onto another using the TokenSurgeon tool.

Tech stack

PythonYAMLHugging Face

Getting it running

Difficulty · moderate Time to first run · 1h+

Requires large disk space and RAM to hold model weights, GPU is optional but speeds up the merge process.

Free to use and modify including in commercial software, but if you distribute a modified version of mergekit itself it must remain open source under LGPL v3.

In plain English

Large language models, the kind that power AI chat tools, are trained at great expense to develop particular strengths. One model might excel at following instructions, another at coding, another at creative writing. Normally, combining those strengths would require either running multiple models at once (expensive) or doing additional training that requires the original training data. Model merging is a different approach: you take the internal numerical weights of two or more models and mathematically blend them to produce a single new model that can inherit capabilities from all of them. The resulting model runs at the same speed and cost as a single model. mergekit is a Python toolkit that automates this process. You write a short configuration file in YAML format describing which models to combine, how much weight to give each, which merging method to apply, and other options. The tool then handles loading the models, performing the merge operation, and writing the result to a new folder. From there you can test it locally or upload it to the Hugging Face Hub, a popular platform for sharing AI models, using commands the README provides. The toolkit supports several merging methods, including simple weighted averaging of model weights, layer-by-layer construction called Frankenmerging that takes specific layers from different models, and evolutionary approaches that automatically search for the best combination of merging parameters by testing different options and evaluating the results. It also includes a tool called TokenSurgeon for transplanting the vocabulary and tokenizer from one model onto another. The tool is designed to work under constrained hardware. Merges can run on a regular CPU without any dedicated graphics card, or with as little as 8 gigabytes of GPU memory, by loading only the parts of each model it needs at a given moment rather than keeping everything in memory at once. A hosted web version called FrankensteinAI is available for users who do not want to set up the toolkit locally. The project is licensed under the GNU LGPL v3.

Copy-paste prompts

Prompt 1
Write a mergekit YAML config that combines two Llama-3 8B models using TIES merging with equal weight and run it on CPU.
Prompt 2
How do I run mergekit on a machine with only 8GB of GPU memory to merge two 7B parameter models without running out of memory?
Prompt 3
After running mergekit, how do I upload the resulting merged model folder to the Hugging Face Hub?
Prompt 4
What is Frankenmerging in mergekit and how do I configure it to take the first 16 layers from model A and the remaining layers from model B?
Prompt 5
How do I use mergekit evolutionary merging mode to automatically optimize a merge for coding benchmark performance?
Open on GitHub → Explain another repo

← arcee-ai on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.