Add offline tool-calling to a browser app without sending queries to OpenAI
Run sub-second function routing on Cloudflare Workers edge nodes
Embed tool-selection AI into a no-std Rust firmware target
Wrap needle-rs in a Python pipeline to pick tools before hitting a bigger LLM
You must download safetensors weights and a vocab file from Hugging Face before the first run; the runtime itself is tiny but model loading is a separate step.
needle-rs is a Rust and WebAssembly runtime for Needle, a small AI model from a company called Cactus Compute. Needle is a 26 million parameter transformer that does exactly one thing: it takes a user's query plus a list of available tools and outputs a JSON object that says which tool to call and with what arguments. This pattern is usually called function calling or tool calling, and it is what lets an AI app decide on its own to book a flight, look up the weather, or run any other action you have defined. The whole runtime fits in 258 kilobytes and the model weights are about 22 megabytes. The point of the project, as the README presents it, is to make tool-routing AI cheap and portable. Normally you either pay an API like OpenAI to do this, which sends data off the user's machine and costs money per call, or you ship hundreds of megabytes of a local language model. needle-rs claims similar routing accuracy at a fraction of the size and around 280 milliseconds of latency, with no API key and no backend. The README is explicit that the model itself, including architecture, training, dataset, and weights, is the work of Cactus Compute, and that this project is a community deployment layer for places the official Python implementation does not cover. It runs in a browser, in Node.js, on Cloudflare Workers (small server-side scripts that run on Cloudflare's edge), as a command line tool on Linux, macOS, and Windows, as a Python wheel installable through pip, and as a Rust library that can target embedded systems without a standard library. Mobile and on-device neural accelerator paths are deferred to Cactus's own engine, which the README presents as complementary. The quick start shows three usage patterns: an npm package for JavaScript with an init call and a NeedleWasm.load function, a Rust crate called needle-infer, and a Python package also called needle-rs. In each case you load the safetensors weights and a vocab file, then call run with a query and a JSON list of tools. Weights can be pulled from Hugging Face with the huggingface-cli tool or loaded straight from a URL in the browser. The project is MIT licensed.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.