seanpedersen/freellmapi

Analysis updated 2026-06-24

★ 0TypeScriptAudience · developerComplexity · 4/5Setup · moderate

Mindmap

mindmap
  root((freellmapi))
    Inputs
      Provider API keys
      OpenAI-style requests
      Bearer token
    Outputs
      Streamed completions
      Routing decisions
      Usage counters
    Use Cases
      Stack free LLM tiers
      Fall back on 429s
      Route LangChain through one URL
    Tech Stack
      TypeScript
      SQLite
      AES-256-GCM
      OpenAI API

mindmap root((freellmapi)) Inputs Provider API keys OpenAI-style requests Bearer token Outputs Streamed completions Routing decisions Usage counters Use Cases Stack free LLM tiers Fall back on 429s Route LangChain through one URL Tech Stack TypeScript SQLite AES-256-GCM OpenAI API

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Pool free tiers from Groq, Cerebras, OpenRouter, and others behind one OpenAI URL

USE CASE 2

Auto-fail over to another provider when a 429 or 5xx comes back

USE CASE 3

Point LangChain or Continue at one local endpoint instead of juggling SDKs

USE CASE 4

Store upstream provider keys encrypted with AES-256-GCM and gate access via bearer tokens

What is it built with?

TypeScriptSQLiteAES-256-GCMOpenAI APINode

How does it compare?

	seanpedersen/freellmapi	airirang/airirang-builder	aisurfer/mcp_ui_app_example
Stars	0	0	0
Language	TypeScript	TypeScript	TypeScript
Setup difficulty	moderate	moderate	moderate
Complexity	4/5	3/5	3/5
Audience	developer	developer	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

Needs API keys from several free LLM providers and a Node toolchain, plus an admin bearer token to access the dashboard.

In plain English

FreeLLMAPI is a local proxy server that pulls together the free tiers of about eleven AI providers and exposes them through a single endpoint that looks identical to the OpenAI API. Supported providers include Google, Groq, Cerebras, SambaNova, NVIDIA, Mistral, OpenRouter, GitHub Models, Cohere, Cloudflare, and Z.ai. The README claims the stacked free tiers add up to roughly 1.3 billion tokens per month of working inference capacity. The motivation is that each free tier on its own is small, and juggling fourteen different SDKs, rate limits, and failure modes by hand is painful. With this proxy, any OpenAI-compatible client library, including tools like LangChain or Continue, can be pointed at your local server and routed transparently across whichever provider keys you have added. The routing layer is the main piece of engineering. A Thompson-sampling bandit assigns each model a score drawn from a Beta posterior over its past success rate, adds a normalised speed term in tokens per second, and subtracts any active rate-limit penalty. The stochastic draw means better models tend to win without locking out unproven ones. If the chosen provider returns a 429 error, a 5xx error, or times out, the router skips it, puts the key on a short cooldown, and retries the next model in the fallback chain up to twenty times. Per-key counters track requests and tokens per minute and per day so the router only picks keys that are under their caps. Multi-turn conversations stick to the same model for thirty minutes to avoid the quality drop from mid-conversation switches. Keys are stored in SQLite encrypted with AES-256-GCM and decrypted in memory only when a request needs them. Client apps authenticate with a single bearer token they get from the dashboard, so upstream provider keys never leave the proxy. A separate admin key gates the dashboard routes. Production mode adds CSP and HSTS headers, locks CORS, and hides stack traces. Features not yet supported include embeddings, image generation, audio, vision inputs, legacy completions, moderation, and multi-tenant billing.

Copy-paste prompts

Prompt 1

Walk me through adding a twelfth provider, including how the bandit picks it up.

Prompt 2

Help me wire LangChain to this proxy and confirm streaming completions work end to end.

Prompt 3

Show me where per-key request and token counters are enforced and how to raise the daily cap.

Prompt 4

Add support for embeddings using the providers that expose a compatible endpoint.

Prompt 5

Tune the Thompson-sampling parameters so a freshly added model gets explored more aggressively.

Frequently asked questions

What is freellmapi?

Local proxy that fans the free tiers of about eleven LLM providers out behind a single OpenAI-compatible endpoint. A Thompson-sampling bandit picks models and falls back on rate-limit or error.

What language is freellmapi written in?

Mainly TypeScript. The stack also includes TypeScript, SQLite, AES-256-GCM.

How hard is freellmapi to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is freellmapi for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.