Run a 405-billion-parameter Llama model interactively on a consumer GPU by joining the public Petals network without downloading the full model.
Contribute your GPU to the Petals network to help others run large AI models while sharing compute costs.
Fine-tune a large language model for a specific task using prompt tuning without storing or training the full model locally.
Set up a private Petals swarm among a trusted team to run large AI models without routing data through public volunteer machines.
Requires a CUDA GPU on Linux, Windows requires WSL setup, data passes through volunteer machines on the public swarm.
Petals is a Python library that lets you run very large AI language models on consumer hardware by spreading the work across multiple computers over the internet, similar to how BitTorrent distributes file downloads across many peers. The models it targets, such as Llama 3.1 (up to 405 billion parameters), Mixtral, Falcon, and BLOOM, are too large to fit on a single consumer GPU. Petals solves this by letting each participant load just a portion of the model's layers, while the system routes data between participants to complete each request. From a user's perspective, you write code against the Petals library much like you would use standard tools from the Hugging Face Transformers library. You load a model, pass it some text, and get generated output back. The difference is that the heavy computation is happening across a network of volunteer-run machines rather than locally. The project reports inference speeds of up to 6 tokens per second for large models, which is enough for interactive chatbot-style use. You can also fine-tune models through the network. Petals supports prompt-tuning, which means you can adapt a model's behavior for a specific task without needing to store or train the full model yourself. Anyone with a GPU can contribute to the network by running the Petals server software, which hosts a slice of a model and serves requests routed to it. Setup instructions are provided for Linux with Anaconda, Windows via the Windows Subsystem for Linux, Docker, and macOS with Apple Silicon. The project runs a public monitor at health.petals.dev showing which models are currently available and how many participants are serving each one. Privacy is flagged as a consideration: in the public swarm, your data passes through other people's machines. The project has a wiki page covering the privacy implications, and it is possible to run a private swarm among a trusted group if that is a concern. The library is backed by a research paper published at ACL 2023.
← bigscience-workshop on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.