jhammant/aiondemandcluster

Analysis updated 2026-05-18

★ 39PythonAudience · developerComplexity · 3/5Setup · moderate

Mindmap

mindmap
  root((aiod))
    What it does
      Rent GPU on vast.ai
      Load any HF model
      Connect to Claude Code
    Commands
      aiod estimate cost
      aiod spin and teardown
      aiod tune configs
      aiod tui interface
    Engines
      vLLM for safetensors
      llama.cpp for GGUF
    Cost Safety
      Max price cap
      TTL reminder window
      Idle auto-shutdown
    Audience
      Developers
      Budget-conscious coders

mindmap root((aiod)) What it does Rent GPU on vast.ai Load any HF model Connect to Claude Code Commands aiod estimate cost aiod spin and teardown aiod tune configs aiod tui interface Engines vLLM for safetensors llama.cpp for GGUF Cost Safety Max price cap TTL reminder window Idle auto-shutdown Audience Developers Budget-conscious coders

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Run a large open coding model on a rented GPU for a few hours to work on a project, then tear it down to stop paying.

USE CASE 2

Use aiod tune to find the cheapest quantization and GPU configuration that keeps response time under your target latency.

USE CASE 3

Connect a GGUF model via llama.cpp to Claude Code without a persistent cloud subscription.

What is it built with?

PythonvLLMllama.cppvast.aiHugging Face

How does it compare?

	jhammant/aiondemandcluster	hadriansecurity/openhack	krishnaik06/image-webscrapper
Stars	39	39	39
Language	Python	Python	Python
Last pushed	—	—	2022-12-08
Maintenance	—	—	Dormant
Setup difficulty	moderate	moderate	moderate
Complexity	3/5	4/5	2/5
Audience	developer	developer	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

Requires a vast.ai API key and Claude Code Router installed locally, gated Hugging Face models also need a HF token.

The README does not specify a license for this repository.

In plain English

AI on Demand Cluster (aiod) is a command-line tool that rents a GPU on vast.ai, loads any Hugging Face language model onto it, and connects that model to Claude Code. The whole workflow runs from one command: you give it a Hugging Face model name, it figures out how much GPU memory the model needs, finds the cheapest matching machine available right now with a live dollar-per-hour price, rents it, waits for the model to load, and updates your local configuration. From that point, Claude Code talks to your self-hosted model instead of the Anthropic API. The tool is aimed at developers who want to run large open models (such as Qwen or Llama variants) without paying for a persistent cloud subscription. You pay only for the time the GPU is rented. When you are done, running aiod teardown destroys the instance and stops billing. Cost guard-rails are built in: you can set a maximum hourly price before renting, a time-to-live window after which the instance reminds you to shut down, and an idle timeout that destroys the GPU automatically if it sits unused. Two inference engines are supported. Safetensors and quantized models (AWQ, fp8, int4) go through vLLM, which serves an OpenAI-compatible API. GGUF files go through llama.cpp. The tool auto-detects which format the model uses. A local proxy called Claude Code Router sits between Claude Code and the remote GPU, translating between the Anthropic and OpenAI API formats so neither side needs special configuration. For tuning, the aiod tune command rents a machine, sweeps concurrency levels and quantization options, and prints a ranked table of configurations by cost per million output tokens. It recommends the cheapest setup that meets a latency target you specify. A dollar ceiling is required before the sweep starts, and the GPU is destroyed whether the sweep succeeds, fails, or is interrupted. The tool is in early release and currently installed from source. A text-based interface (TUI) is available for people who prefer a menu-driven workflow over command-line flags.

Copy-paste prompts

Prompt 1

I want to use aiod to spin up Qwen2.5-Coder-32B on a rented vast.ai GPU and connect it to Claude Code. Walk me through setup from installing aiod to running ccr code.

Prompt 2

How do I set a maximum price cap and auto-teardown timer with aiod so I don't leave a GPU running and racking up charges overnight?

Prompt 3

Explain how aiod tune works: what does it measure, how does it pick the recommended configuration, and what does --max-cost enforce?

Prompt 4

I want to run a GGUF model on a multi-GPU vast.ai instance using aiod. How do I specify the model and confirm llama.cpp was selected as the engine?

Prompt 5

How do I save a working aiod configuration as a named profile and reuse it later with aiod spin --profile?

Frequently asked questions

What is aiondemandcluster?

aiod rents a GPU on vast.ai, loads any Hugging Face model with one command, and connects it to Claude Code for pay-by-the-hour AI coding without a persistent cloud subscription.

What language is aiondemandcluster written in?

Mainly Python. The stack also includes Python, vLLM, llama.cpp.

What license does aiondemandcluster use?

The README does not specify a license for this repository.

How hard is aiondemandcluster to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is aiondemandcluster for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub jhammant on gitmyhub

Verify against the repo before relying on details.