geekgineer/needle-rs

Analysis updated 2026-06-24

★ 26RustAudience · developerComplexity · 4/5LicenseSetup · moderate

Mindmap

mindmap
  root((needle-rs))
    Inputs
      User query
      Tool list JSON
      Safetensors weights
      Vocab file
    Outputs
      Tool name
      Arguments JSON
      280ms decision
    Use Cases
      Browser tool routing
      Edge function calling
      Embedded device AI
      Offline agent
    Tech Stack
      Rust
      WebAssembly
      Node.js
      Python
      Cloudflare Workers

mindmap root((needle-rs)) Inputs User query Tool list JSON Safetensors weights Vocab file Outputs Tool name Arguments JSON 280ms decision Use Cases Browser tool routing Edge function calling Embedded device AI Offline agent Tech Stack Rust WebAssembly Node.js Python Cloudflare Workers

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Add offline tool-calling to a browser app without sending queries to OpenAI

USE CASE 2

Run sub-second function routing on Cloudflare Workers edge nodes

USE CASE 3

Embed tool-selection AI into a no-std Rust firmware target

USE CASE 4

Wrap needle-rs in a Python pipeline to pick tools before hitting a bigger LLM

What is it built with?

RustWebAssemblyNode.jsPythonCloudflare Workers

How does it compare?

	geekgineer/needle-rs	tonbo-io/ursula	aftertonesignal/brume
Stars	26	25	24
Language	Rust	Rust	Rust
Setup difficulty	moderate	hard	hard
Complexity	4/5	5/5	5/5
Audience	developer	ops devops	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

You must download safetensors weights and a vocab file from Hugging Face before the first run, the runtime itself is tiny but model loading is a separate step.

MIT lets you use, modify, and redistribute the code for any purpose as long as you keep the copyright and license text.

In plain English

needle-rs is a Rust and WebAssembly runtime for Needle, a small AI model from a company called Cactus Compute. Needle is a 26 million parameter transformer that does exactly one thing: it takes a user's query plus a list of available tools and outputs a JSON object that says which tool to call and with what arguments. This pattern is usually called function calling or tool calling, and it is what lets an AI app decide on its own to book a flight, look up the weather, or run any other action you have defined. The whole runtime fits in 258 kilobytes and the model weights are about 22 megabytes. The point of the project, as the README presents it, is to make tool-routing AI cheap and portable. Normally you either pay an API like OpenAI to do this, which sends data off the user's machine and costs money per call, or you ship hundreds of megabytes of a local language model. needle-rs claims similar routing accuracy at a fraction of the size and around 280 milliseconds of latency, with no API key and no backend. The README is explicit that the model itself, including architecture, training, dataset, and weights, is the work of Cactus Compute, and that this project is a community deployment layer for places the official Python implementation does not cover. It runs in a browser, in Node.js, on Cloudflare Workers (small server-side scripts that run on Cloudflare's edge), as a command line tool on Linux, macOS, and Windows, as a Python wheel installable through pip, and as a Rust library that can target embedded systems without a standard library. Mobile and on-device neural accelerator paths are deferred to Cactus's own engine, which the README presents as complementary. The quick start shows three usage patterns: an npm package for JavaScript with an init call and a NeedleWasm.load function, a Rust crate called needle-infer, and a Python package also called needle-rs. In each case you load the safetensors weights and a vocab file, then call run with a query and a JSON list of tools. Weights can be pulled from Hugging Face with the huggingface-cli tool or loaded straight from a URL in the browser. The project is MIT licensed.

Copy-paste prompts

Prompt 1

Load needle-rs in a browser via the npm package and route a weather query to one of three tools

Prompt 2

Build a Cloudflare Worker that uses needle-rs to choose a tool from a JSON list per request

Prompt 3

Use the needle-infer Rust crate to add tool routing to an existing CLI without a network call

Prompt 4

Fetch the Needle safetensors weights from Hugging Face and load them into a Python script with needle-rs

Prompt 5

Compile needle-rs for a no-std embedded target and benchmark inference latency

Frequently asked questions

What is needle-rs?

Rust and WebAssembly runtime for the 26M parameter Needle model that picks which tool an AI app should call, fitting in 258KB plus 22MB of weights.

What language is needle-rs written in?

Mainly Rust. The stack also includes Rust, WebAssembly, Node.js.

What license does needle-rs use?

MIT lets you use, modify, and redistribute the code for any purpose as long as you keep the copyright and license text.

How hard is needle-rs to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is needle-rs for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.