Pool free tiers from Groq, Cerebras, OpenRouter, and others behind one OpenAI URL
Auto-fail over to another provider when a 429 or 5xx comes back
Point LangChain or Continue at one local endpoint instead of juggling SDKs
Store upstream provider keys encrypted with AES-256-GCM and gate access via bearer tokens
Needs API keys from several free LLM providers and a Node toolchain, plus an admin bearer token to access the dashboard.
FreeLLMAPI is a local proxy server that pulls together the free tiers of about eleven AI providers and exposes them through a single endpoint that looks identical to the OpenAI API. Supported providers include Google, Groq, Cerebras, SambaNova, NVIDIA, Mistral, OpenRouter, GitHub Models, Cohere, Cloudflare, and Z.ai. The README claims the stacked free tiers add up to roughly 1.3 billion tokens per month of working inference capacity. The motivation is that each free tier on its own is small, and juggling fourteen different SDKs, rate limits, and failure modes by hand is painful. With this proxy, any OpenAI-compatible client library, including tools like LangChain or Continue, can be pointed at your local server and routed transparently across whichever provider keys you have added. The routing layer is the main piece of engineering. A Thompson-sampling bandit assigns each model a score drawn from a Beta posterior over its past success rate, adds a normalised speed term in tokens per second, and subtracts any active rate-limit penalty. The stochastic draw means better models tend to win without locking out unproven ones. If the chosen provider returns a 429 error, a 5xx error, or times out, the router skips it, puts the key on a short cooldown, and retries the next model in the fallback chain up to twenty times. Per-key counters track requests and tokens per minute and per day so the router only picks keys that are under their caps. Multi-turn conversations stick to the same model for thirty minutes to avoid the quality drop from mid-conversation switches. Keys are stored in SQLite encrypted with AES-256-GCM and decrypted in memory only when a request needs them. Client apps authenticate with a single bearer token they get from the dashboard, so upstream provider keys never leave the proxy. A separate admin key gates the dashboard routes. Production mode adds CSP and HSTS headers, locks CORS, and hides stack traces. Features not yet supported include embeddings, image generation, audio, vision inputs, legacy completions, moderation, and multi-tenant billing.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.