Reproduce the tweet's interpolation vs binary search timing comparison
Benchmark a 1 billion key lookup if your machine has 40+ GiB of RAM
Generate the SVG and HTML diagrams that illustrate probe positions
Study an open-addressed hash set with linear probing in idiomatic Rust
Just cargo run --release works for the default 1K and 1M cases; the --include-1b switch needs roughly 40 GiB of RAM.
interpolation_search is a small Rust project that benchmarks three different ways of looking up a 64 bit integer in a large sorted set. It is built around a comparison from a tweet, referred to in the code as the tweet.txt experiment. The three contenders are interpolation search over a sorted array of unique u64 keys, the standard library's slice::binary_search on the same array, and a deliberately simple open addressed hash set with linear probing, modular wraparound, the value 0 as the empty sentinel, and a 75 percent target load factor. The benchmark generator is deterministic. Stored keys are produced by feeding the integers 0 through n into a mix64 hash function, sorted for the array searches and shuffled before being inserted into the hash set. Timed queries draw a pseudorandom xorshift64 index in the range 0 to 2n and look up mix64 of that index, giving a 50 percent hit rate without storing a separate query stream and without a short repeated access cycle. A smoke run uses cargo run --release with flags such as --sizes 1K, --queries 100000, and --cold-copies 1. Running cargo run --release on its own runs the default 1K and 1M benchmarks. The --include-1b switch adds a 1 billion entry case, but only fits on a machine with enough memory, since the 1B hash set cold arena alone is about 40 GiB. The --cold-copies flag controls how many copies of the data are kept resident at the largest requested size, and smaller sizes scale inversely so the total cold footprint stays roughly constant. A --cold-reference-size option applies the same scaling while running only a subset of sizes, which is useful when sweeping random seeds at the smaller end. Alongside timings, the program writes three visual artefacts: an SVG table for the first tweet comparison, an SVG illustration of binary midpoint versus interpolation probe for the second tweet, and an HTML draggable label version of that second diagram. The --no-svg flag skips assets, and --tweet2-only regenerates only the second tweet output without re running the benchmarks.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.