Run SQL analytics on terabytes of data in Amazon S3 without managing separate database infrastructure.
Call an AI model from inside a SQL query using a Python sandbox function applied to an entire table.
Create a snapshot branch of production data to test a transformation safely without touching live data.
Run full-text search and vector similarity search on the same dataset using standard SQL statements.
Production use requires cloud object storage (S3, Azure Blob, or GCS), a local pip install is available for development and testing.
Databend is an open-source data warehouse built in Rust, designed to store and analyze large amounts of data stored in cloud object storage like Amazon S3, Azure Blob, or Google Cloud Storage. A data warehouse is a database system designed for analytical queries, meaning it is optimized for reading and summarizing large datasets rather than for fast individual record lookups. Databend handles that kind of workload while also adding vector search and full-text search in the same engine, so you do not need separate systems for those tasks. One of the distinctive features is what the README calls "agent-ready" architecture. You can write Python functions inside the database using a feature called sandbox UDFs (user-defined functions). Those functions run in isolated containers, and you call them from regular SQL queries. The example in the README shows defining a function that could call an AI model and then running it over a table of data with a single SQL statement. This lets you combine data processing and AI logic without moving data to a separate application. Data branching is also supported, described as working like version control for data. You can create a snapshot of production data and let processes run on that snapshot without affecting the live data, similar to creating a branch in code version control. Getting started is quick: there is a Python package you can install with pip for local development, a Docker image for running the full system locally, and a hosted cloud service. The cloud version is described as production-ready in about sixty seconds. The project is dual-licensed under Apache 2.0 and Elastic 2.0. An enterprise edition with additional support options is available from the company behind the project.
← databendlabs on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.