explaingit

dynamic-alpha/ducktape

19ClojureAudience · dataComplexity · 3/5ActiveSetup · moderate

TLDR

Clojure library that bridges tech.v3.dataset and DuckDB using Java 22 Panama, with broader column-type coverage and a streaming appender for small-batch loads.

Mindmap

mindmap
  root((ducktape))
    Inputs
      Dataset tables
      SQL queries
      Batched records
    Outputs
      DuckDB tables
      Query results
    Use Cases
      Replace tmducken
      Stream Kafka batches
      Ingest paginated API
      Run SQL on datasets
    Tech Stack
      Clojure
      DuckDB
      Java 22 Panama
      tech.v3.dataset

Things people build with this

USE CASE 1

Move data between Clojure tech.v3.dataset and DuckDB inside one process

USE CASE 2

Replace tmducken with a Panama-backed bridge for more predictable memory release

USE CASE 3

Stream small Kafka or paginated API batches into DuckDB through a reusable appender

USE CASE 4

Work with richer DuckDB types like STRUCT, MAP, LIST, ENUM, and HUGEINT from Clojure

Tech stack

ClojureDuckDBJavaPanama

Getting it running

Difficulty · moderate Time to first run · 30min

Requires Java 22 or newer, DuckDB 1.5 or newer, and the JVM flag that enables native foreign access.

In plain English

Ducktape is a Clojure library that connects two existing pieces of software. The first is tech.v3.dataset, a Clojure way of working with tables of data in memory. The second is DuckDB, an embedded database that runs inside your own program rather than as a separate server. Ducktape lets a Clojure program move data back and forth between these two worlds. The README presents it as a near drop-in replacement for an older project called tmducken. The main reason the author gives for writing a new library is the underlying way it talks to DuckDB. Tmducken uses an older bridge called JNA. Ducktape uses Java's newer Panama foreign function and memory API, which is only available from Java 22 onwards. The README claims this brings two practical benefits. Memory used by the database is released at predictable times instead of waiting for the garbage collector, which the author says removes a class of crashes that could happen with the older approach. The library also covers more DuckDB column types than tmducken did, including BLOB, HUGEINT, DECIMAL, INTERVAL, ENUM, LIST, STRUCT, MAP, and several timestamp precision variants. A table in the README lists each type and confirms it can be both read and written. A second feature is a streaming appender. For programs that feed a database many small batches at a time, such as Kafka consumers or paginated API ingest, the appender keeps a single connection ready across batches instead of paying setup cost each time. The README claims this can be up to ten times faster for small-batch loads, and that overall read and write performance is up to four times faster than tmducken, with benchmark numbers shown lower in the document. Installation is one Clojure dependency and one JVM option that turns on native access. Java 22 or newer and DuckDB 1.5 or newer are required. The quick start example opens an in-memory database, creates a small table of names and scores from a dataset, and runs a SQL query against it.

Copy-paste prompts

Prompt 1
Add ducktape to my deps.edn, enable native access on Java 22, and open an in-memory DuckDB session
Prompt 2
Convert a tech.v3.dataset of 100k rows into a DuckDB table and run a GROUP BY query on it
Prompt 3
Migrate a tmducken pipeline to ducktape and call out any code changes I need to make
Prompt 4
Use the streaming appender to insert Kafka batches into DuckDB with one persistent connection
Prompt 5
Show me how to round-trip a STRUCT column between tech.v3.dataset and DuckDB
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.