Build a portable .rag file from a folder of documents and ship it to teammates
Swap between Groq, OpenAI, Gemini, or Anthropic for generation without rebuilding the index
Run a fully local RAG pipeline with Sentence Transformers embeddings
Embed a prebuilt knowledge base inside a desktop or CLI tool with one file
Base install is light, but using local embeddings pulls in heavy Sentence Transformers dependencies you opt into explicitly.
RagBucket is a Python library that tries to make Retrieval-Augmented Generation systems portable. A RAG system is the common setup where you take a pile of documents, break them into chunks, turn the chunks into numerical embeddings, and store those in a vector index so a language model can later look up relevant pieces before answering a question. The author's complaint is that every machine learning model format, such as .pt.onnx.gguf, and .h5, is portable by default, but a typical RAG pipeline is spread across vector databases, embedding scripts, chunking configs, and provider integrations that all have to be rebuilt when you move between machines. The library's answer is a single file format called .rag. A .rag artifact bundles three things into one file: a FAISS vector index, the chunked documents as JSON, and a manifest that records the embedding configuration, model info, and version. The pitch is that you build it once, ship the file around, and then load and query it anywhere with one line of code, with no external vector database to set up. The quickstart shows two short Python scripts. The first uses a RagBuilder with a RagConfig that picks an embedding provider, chunk size and overlap, and top_k retrieval setting, and then writes the .rag file from a folder of documents. The second uses a RagRuntime that loads the .rag file, attaches a generation provider such as Groq with a Llama 3.1 model, and exposes a rag.ask method that takes a question and returns an answer. RagBucket cleanly separates retrieval from generation, so the embedding and generation sides can be mixed and matched. Supported generation providers include Groq, OpenAI, Gemini, and Anthropic, with example models listed for each. Supported embedding providers include a local Sentence Transformers option, Cohere, OpenAI, Gemini, and Voyage. The base install stays light: heavy local embedding dependencies are only pulled in if you set the embedding provider to local. Installation is through uv or pip as the ragbucket package, the project is MIT licensed, and the repository has zero stars at the time of writing.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.