lugga1s/oxidellm

Analysis updated 2026-05-18

★ 2RustAudience · developerComplexity · 4/5LicenseSetup · moderate

Mindmap

mindmap
  root((oxideLLM))
    What it does
      LLM request gateway
      SSE pass-through
      Multi-upstream failover
    Architecture
      Async telemetry queue
      Background worker
      Zero-copy streaming
    Tech Stack
      Rust
      Axum
      Tokio
      k6 benchmarks
    Audience
      Backend developers
      MLOps engineers

mindmap root((oxideLLM)) What it does LLM request gateway SSE pass-through Multi-upstream failover Architecture Async telemetry queue Background worker Zero-copy streaming Tech Stack Rust Axum Tokio k6 benchmarks Audience Backend developers MLOps engineers

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Route application traffic through a local gateway to Ollama or vLLM with automatic failover when the primary server goes down.

USE CASE 2

Collect telemetry and log AI API usage without adding measurable latency to responses using the async background worker.

USE CASE 3

Run reproducible benchmarks comparing direct-to-LLM versus gateway latency using the included k6 test scripts.

What is it built with?

RustAxumTokioJSONLk6

How does it compare?

	lugga1s/oxidellm	callmealphabet/fastcp	codingstark-dev/decant
Stars	2	2	2
Language	Rust	Rust	Rust
Setup difficulty	moderate	easy	easy
Complexity	4/5	1/5	3/5
Audience	developer	ops devops	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

Requires the Rust toolchain (cargo build) and a running OpenAI-compatible upstream such as Ollama or vLLM.

Open-source under AGPL-3.0: free to use and modify, but any modified version you distribute or run as a service must also be released as open source.

In plain English

oxideLLM is a Rust program that sits between your application and any OpenAI-compatible AI API (such as Ollama or vLLM). Instead of your application calling the AI service directly, it calls the gateway, which forwards the request onward. The key design goal is to add telemetry, logging, and failover without adding latency to the actual response. Most gateways slow things down because they do their tracking work on the same connection path as the request. If the gateway must write a log entry or record a database row before it can send a reply, every request waits for that write. oxideLLM avoids this by separating tracking completely from the response path: telemetry events go into a background queue in microseconds, and a separate worker processes them after the response has already been delivered to the caller. The gateway supports streaming responses (the format where text arrives token by token rather than all at once) and passes those tokens through as raw bytes without decoding and re-encoding each one. It also supports configuring multiple upstream AI services, so if the primary one returns an error or becomes unavailable, the gateway automatically retries the next configured one in sequence. Benchmarks in the README compare sending requests directly to an AI server versus routing them through the gateway. On a local test setup with 1,000 simultaneous connections, the gateway added about 13% overhead compared to going direct. The project describes these results honestly as local, virtualized measurements that have not yet been validated on bare-metal external servers, and it distinguishes clearly between what is proven and what is still under investigation. oxideLLM is written in Rust and compiles to a single binary with no runtime dependencies. It is at version 0.9.0 alpha and is licensed under AGPL-3.0, which is a copyleft open-source license.

Copy-paste prompts

Prompt 1

Set up oxideLLM as a gateway in front of my local Ollama instance. What config do I need and what commands start it?

Prompt 2

How does oxideLLM keep telemetry off the critical request path? Explain the bounded ring buffer and background worker design.

Prompt 3

Configure oxideLLM with two upstream AI providers so it automatically retries the second one if the first returns a 503 or 429 error.

Prompt 4

How do I run the included k6 benchmark to measure the latency overhead of oxideLLM versus calling my vLLM instance directly?

Frequently asked questions

What is oxidellm?

A Rust gateway for OpenAI-compatible AI APIs that forwards streaming responses with minimal overhead by keeping telemetry and logging off the main request path.

What language is oxidellm written in?

Mainly Rust. The stack also includes Rust, Axum, Tokio.

What license does oxidellm use?

Open-source under AGPL-3.0: free to use and modify, but any modified version you distribute or run as a service must also be released as open source.

How hard is oxidellm to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is oxidellm for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub lugga1s on gitmyhub

Verify against the repo before relying on details.