explaingit

higherorderco/bend

19,359RustAudience · developerComplexity · 4/5QuietLicenseSetup · hard

TLDR

A programming language that automatically runs your code in parallel across thousands of processors and GPUs without you managing threads or locks.

Mindmap

mindmap
  root((Bend))
    What it does
      Auto-parallelizes code
      Scales to GPUs
      No manual threading
    How it works
      Detects independent tasks
      Interaction Combinators
      HVM2 runtime
    Use cases
      Sorting algorithms
      Simulations
      Tree traversals
      Number crunching
    Tech stack
      Rust language
      C runtime
      CUDA support
    Audience
      Performance engineers
      Algorithm developers
      GPU programmers

Things people build with this

USE CASE 1

Write divide-and-conquer sorting algorithms that automatically run in parallel across GPUs without manual thread management.

USE CASE 2

Build physics simulations or numerical computations that scale from single-core to thousands of concurrent threads transparently.

USE CASE 3

Implement tree traversal and recursive algorithms that Bend automatically distributes across available processors.

USE CASE 4

Optimize computationally heavy workloads on NVIDIA GPUs without rewriting code in CUDA or dealing with multi-threading bugs.

Tech stack

RustCCUDAHVM2

Getting it running

Difficulty · hard Time to first run · 1day+

Requires CUDA toolkit installation, Rust compilation from source, and HVM2 runtime setup; GPU drivers and multiple build dependencies needed.

Use freely for any purpose including commercial. Keep the notice and disclose changes to the patent grant.

In plain English

Bend is a high-level programming language designed to automatically run your code across thousands of parallel processors, including GPUs, without requiring you to manually manage threads, locks, or any of the usual parallel programming complexity. Think of it like writing regular Python or Haskell-style code that silently scales across over 10,000 concurrent threads on its own. The way it works is surprisingly straightforward: if your algorithm can be broken into independent tasks (like splitting a calculation into two halves that don't depend on each other), Bend detects that and runs those tasks simultaneously. You write the logic, Bend figures out the parallelism. It's powered by an underlying runtime called HVM2, which handles the parallel execution using a model called Interaction Combinators. You'd reach for Bend when you have computationally heavy problems, sorting, simulation, tree traversals, number crunching, and want to squeeze the most out of modern hardware without rewriting your code in CUDA or dealing with multi-threading bugs. It's especially useful for divide-and-conquer style algorithms that naturally split into independent subtasks. The tech stack is built in Rust, runs via C and CUDA runtimes, and currently supports NVIDIA GPUs only. Windows users need WSL2. Single-core performance is still being improved, but multi-core and GPU speedups are already demonstrated with real benchmarks.

Copy-paste prompts

Prompt 1
Show me a Bend program that sorts a large array using a divide-and-conquer approach and explain how it automatically parallelizes.
Prompt 2
How do I write a recursive tree traversal in Bend that will automatically run in parallel across multiple GPU cores?
Prompt 3
Give me a Bend example of a simple algorithm that demonstrates how the language detects and exploits independent tasks for parallelism.
Prompt 4
What are the performance differences between writing a simulation in Bend versus manually managing CUDA threads for the same problem?
Prompt 5
How do I set up Bend on Windows with WSL2 and run my first parallel program on an NVIDIA GPU?
Open on GitHub → Explain another repo

Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.