duration-ai/bonsai-image-android

Analysis updated 2026-05-18

★ 8PythonAudience · researcherComplexity · 5/5LicenseSetup · hard

Mindmap

mindmap
  root((bonsai-image-android))
    What it does
      Text to image on-device
      Hexagon NPU inference
      No network required
      512x512 in ~140 seconds
    Tech Stack
      Python
      C++ runner
      Qualcomm QNN QAIRT
      Android NDK
      stable-diffusion.cpp
    Use Cases
      On-device AI image generation
      Mobile NPU benchmarking
      Offline creative tools
    Audience
      AI researchers
      Mobile ML engineers
      On-device inference specialists

mindmap root((bonsai-image-android)) What it does Text to image on-device Hexagon NPU inference No network required 512x512 in ~140 seconds Tech Stack Python C++ runner Qualcomm QNN QAIRT Android NDK stable-diffusion.cpp Use Cases On-device AI image generation Mobile NPU benchmarking Offline creative tools Audience AI researchers Mobile ML engineers On-device inference specialists

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Generate images from text prompts entirely on a 2025 Android phone without sending data to a server.

USE CASE 2

Benchmark a Snapdragon 8 Elite Hexagon NPU against CPU and GPU paths for a real diffusion transformer workload.

USE CASE 3

Use as a reference for porting other transformer models to run on Qualcomm's Hexagon NPU via QNN.

What is it built with?

PythonC++Qualcomm QNNAndroid NDKstable-diffusion.cpp

How does it compare?

	duration-ai/bonsai-image-android	adam-s/car-diagnosis	bongobongo2020/krea2-character-lora-trainer
Stars	8	8	8
Language	Python	Python	Python
Setup difficulty	hard	moderate	moderate
Complexity	5/5	3/5	3/5
Audience	researcher	researcher	vibe coder

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

Requires Qualcomm QAIRT SDK, Android NDK, a device with Hexagon V79, and 10.7 GB of model and binary files not included in the repo.

Apache 2.0 license: use freely for any purpose including commercial use, must include the license notice and NOTICE file.

In plain English

This repo demonstrates running a compact AI image-generation model entirely on a 2025 Android flagship phone, with no internet connection required. The model produces 512x512 images from text prompts, and all processing happens on the device's specialized chips. The model is called Bonsai Image, built by PrismML from a 4-billion-parameter architecture called FLUX.2 klein, compressed down using a technique called ternary quantization (weights stored as three possible values instead of full floating-point numbers). The image generation happens in three stages: the text prompt is encoded into a numerical representation on the phone's main processor, a diffusion transformer then refines a noisy image over four steps on the phone's dedicated AI chip (Qualcomm's Hexagon NPU), and finally the result is decoded back into a visible image on the CPU again. The Hexagon NPU path is what makes this interesting. Running the transformer on the CPU would take roughly eight to nine minutes per image, which is not practical. Running it on the NPU brings that down to about two minutes and twenty seconds for the full four-step render. The GPU was tested but crashed or faulted at the 512x512 size. Benchmark numbers for all three compute paths are documented in detail. Building this yourself requires the Qualcomm QAIRT SDK, an Android NDK, and a device with a Hexagon V79 chip (a 2025 Snapdragon 8 Elite phone). The process involves exporting the model's 27 individual blocks into compiled binaries for the NPU, cross-compiling a small C++ runner that chains them together on-device, and patching a CPU-side tool to hand off the text encoding and image decoding stages correctly. The total bundle size is about 10.7 GB, most of which is the compiled NPU binaries. This is a companion to the same team's iOS version, which runs the same model on a 2020 iPhone GPU through Apple's MLX framework. Both achieve roughly the same total render time. The repo serves as a technical reference for running large AI models on mobile hardware without cloud infrastructure.

Copy-paste prompts

Prompt 1

Walk me through the three-stage pipeline in bonsai-image-android: what runs on the CPU vs the Hexagon NPU and how the C++ runner chains the 27 QNN binaries together.

Prompt 2

What does the npu-split.patch in bonsai-image-android do, and how does it modify sd-cli to hand off CPU and NPU stages correctly?

Prompt 3

Help me reproduce the bonsai-image-android build: I have the QAIRT SDK and an NDK, walk me through export, build, and runner compilation steps.

Prompt 4

Compare the bonsai-image-android NPU results to the CPU and GPU benchmarks documented in the repo and explain why only the NPU finishes a 512x512 render in usable time.

Frequently asked questions

What is bonsai-image-android?

Runs a 4B-parameter AI image generation model entirely on a 2025 Android phone's Hexagon NPU, producing 512x512 images from text prompts with no internet connection.

What language is bonsai-image-android written in?

Mainly Python. The stack also includes Python, C++, Qualcomm QNN.

What license does bonsai-image-android use?

Apache 2.0 license: use freely for any purpose including commercial use, must include the license notice and NOTICE file.

How hard is bonsai-image-android to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is bonsai-image-android for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub duration-ai on gitmyhub

Verify against the repo before relying on details.