Analysis updated 2026-05-18
Run a large open coding model on a rented GPU for a few hours to work on a project, then tear it down to stop paying.
Use aiod tune to find the cheapest quantization and GPU configuration that keeps response time under your target latency.
Connect a GGUF model via llama.cpp to Claude Code without a persistent cloud subscription.
| jhammant/aiondemandcluster | hadriansecurity/openhack | krishnaik06/image-webscrapper | |
|---|---|---|---|
| Stars | 39 | 39 | 39 |
| Language | Python | Python | Python |
| Last pushed | — | — | 2022-12-08 |
| Maintenance | — | — | Dormant |
| Setup difficulty | moderate | moderate | moderate |
| Complexity | 3/5 | 4/5 | 2/5 |
| Audience | developer | developer | developer |
Figures from each repo's GitHub metadata at analysis time.
Requires a vast.ai API key and Claude Code Router installed locally, gated Hugging Face models also need a HF token.
AI on Demand Cluster (aiod) is a command-line tool that rents a GPU on vast.ai, loads any Hugging Face language model onto it, and connects that model to Claude Code. The whole workflow runs from one command: you give it a Hugging Face model name, it figures out how much GPU memory the model needs, finds the cheapest matching machine available right now with a live dollar-per-hour price, rents it, waits for the model to load, and updates your local configuration. From that point, Claude Code talks to your self-hosted model instead of the Anthropic API. The tool is aimed at developers who want to run large open models (such as Qwen or Llama variants) without paying for a persistent cloud subscription. You pay only for the time the GPU is rented. When you are done, running aiod teardown destroys the instance and stops billing. Cost guard-rails are built in: you can set a maximum hourly price before renting, a time-to-live window after which the instance reminds you to shut down, and an idle timeout that destroys the GPU automatically if it sits unused. Two inference engines are supported. Safetensors and quantized models (AWQ, fp8, int4) go through vLLM, which serves an OpenAI-compatible API. GGUF files go through llama.cpp. The tool auto-detects which format the model uses. A local proxy called Claude Code Router sits between Claude Code and the remote GPU, translating between the Anthropic and OpenAI API formats so neither side needs special configuration. For tuning, the aiod tune command rents a machine, sweeps concurrency levels and quantization options, and prints a ranked table of configurations by cost per million output tokens. It recommends the cheapest setup that meets a latency target you specify. A dollar ceiling is required before the sweep starts, and the GPU is destroyed whether the sweep succeeds, fails, or is interrupted. The tool is in early release and currently installed from source. A text-based interface (TUI) is available for people who prefer a menu-driven workflow over command-line flags.
aiod rents a GPU on vast.ai, loads any Hugging Face model with one command, and connects it to Claude Code for pay-by-the-hour AI coding without a persistent cloud subscription.
Mainly Python. The stack also includes Python, vLLM, llama.cpp.
The README does not specify a license for this repository.
Setup difficulty is rated moderate, with roughly 30min to a first successful run.
Mainly developer.
This repo across BitVibe Labs
Verify against the repo before relying on details.