Analysis updated 2026-05-18
Route application traffic through a local gateway to Ollama or vLLM with automatic failover when the primary server goes down.
Collect telemetry and log AI API usage without adding measurable latency to responses using the async background worker.
Run reproducible benchmarks comparing direct-to-LLM versus gateway latency using the included k6 test scripts.
| lugga1s/oxidellm | callmealphabet/fastcp | codingstark-dev/decant | |
|---|---|---|---|
| Stars | 2 | 2 | 2 |
| Language | Rust | Rust | Rust |
| Setup difficulty | moderate | easy | easy |
| Complexity | 4/5 | 1/5 | 3/5 |
| Audience | developer | ops devops | developer |
Figures from each repo's GitHub metadata at analysis time.
Requires the Rust toolchain (cargo build) and a running OpenAI-compatible upstream such as Ollama or vLLM.
oxideLLM is a Rust program that sits between your application and any OpenAI-compatible AI API (such as Ollama or vLLM). Instead of your application calling the AI service directly, it calls the gateway, which forwards the request onward. The key design goal is to add telemetry, logging, and failover without adding latency to the actual response. Most gateways slow things down because they do their tracking work on the same connection path as the request. If the gateway must write a log entry or record a database row before it can send a reply, every request waits for that write. oxideLLM avoids this by separating tracking completely from the response path: telemetry events go into a background queue in microseconds, and a separate worker processes them after the response has already been delivered to the caller. The gateway supports streaming responses (the format where text arrives token by token rather than all at once) and passes those tokens through as raw bytes without decoding and re-encoding each one. It also supports configuring multiple upstream AI services, so if the primary one returns an error or becomes unavailable, the gateway automatically retries the next configured one in sequence. Benchmarks in the README compare sending requests directly to an AI server versus routing them through the gateway. On a local test setup with 1,000 simultaneous connections, the gateway added about 13% overhead compared to going direct. The project describes these results honestly as local, virtualized measurements that have not yet been validated on bare-metal external servers, and it distinguishes clearly between what is proven and what is still under investigation. oxideLLM is written in Rust and compiles to a single binary with no runtime dependencies. It is at version 0.9.0 alpha and is licensed under AGPL-3.0, which is a copyleft open-source license.
A Rust gateway for OpenAI-compatible AI APIs that forwards streaming responses with minimal overhead by keeping telemetry and logging off the main request path.
Mainly Rust. The stack also includes Rust, Axum, Tokio.
Open-source under AGPL-3.0: free to use and modify, but any modified version you distribute or run as a service must also be released as open source.
Setup difficulty is rated moderate, with roughly 30min to a first successful run.
Mainly developer.
This repo across BitVibe Labs
Verify against the repo before relying on details.