explaingit

haskallcurry/interpolation_search

0RustAudience · developerComplexity · 3/5ActiveSetup · easy

TLDR

Rust benchmark that compares interpolation search, binary search, and an open-addressed hash set on 64-bit integer lookups over large sorted sets.

Mindmap

mindmap
  root((interpolation-search))
    Inputs
      Size flags
      Query count
      Cold copies
      Random seed
    Outputs
      Timing tables
      SVG diagrams
      HTML draggable label
    Use Cases
      Compare lookup algorithms
      Reproduce a viral tweet
      Stress test cache behaviour
    Tech Stack
      Rust
      Cargo

Things people build with this

USE CASE 1

Reproduce the tweet's interpolation vs binary search timing comparison

USE CASE 2

Benchmark a 1 billion key lookup if your machine has 40+ GiB of RAM

USE CASE 3

Generate the SVG and HTML diagrams that illustrate probe positions

USE CASE 4

Study an open-addressed hash set with linear probing in idiomatic Rust

Tech stack

RustCargo

Getting it running

Difficulty · easy Time to first run · 5min

Just cargo run --release works for the default 1K and 1M cases; the --include-1b switch needs roughly 40 GiB of RAM.

In plain English

interpolation_search is a small Rust project that benchmarks three different ways of looking up a 64 bit integer in a large sorted set. It is built around a comparison from a tweet, referred to in the code as the tweet.txt experiment. The three contenders are interpolation search over a sorted array of unique u64 keys, the standard library's slice::binary_search on the same array, and a deliberately simple open addressed hash set with linear probing, modular wraparound, the value 0 as the empty sentinel, and a 75 percent target load factor. The benchmark generator is deterministic. Stored keys are produced by feeding the integers 0 through n into a mix64 hash function, sorted for the array searches and shuffled before being inserted into the hash set. Timed queries draw a pseudorandom xorshift64 index in the range 0 to 2n and look up mix64 of that index, giving a 50 percent hit rate without storing a separate query stream and without a short repeated access cycle. A smoke run uses cargo run --release with flags such as --sizes 1K, --queries 100000, and --cold-copies 1. Running cargo run --release on its own runs the default 1K and 1M benchmarks. The --include-1b switch adds a 1 billion entry case, but only fits on a machine with enough memory, since the 1B hash set cold arena alone is about 40 GiB. The --cold-copies flag controls how many copies of the data are kept resident at the largest requested size, and smaller sizes scale inversely so the total cold footprint stays roughly constant. A --cold-reference-size option applies the same scaling while running only a subset of sizes, which is useful when sweeping random seeds at the smaller end. Alongside timings, the program writes three visual artefacts: an SVG table for the first tweet comparison, an SVG illustration of binary midpoint versus interpolation probe for the second tweet, and an HTML draggable label version of that second diagram. The --no-svg flag skips assets, and --tweet2-only regenerates only the second tweet output without re running the benchmarks.

Copy-paste prompts

Prompt 1
Walk me through how interpolation_search generates deterministic keys with mix64
Prompt 2
Explain what --cold-copies and --cold-reference-size do to the working set size
Prompt 3
Help me add a new search variant like Eytzinger layout to the benchmark harness
Prompt 4
Show me how the SVG probe diagram is built so I can tweak the styling
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.