Analysis updated 2026-05-18
Generate images from text prompts entirely on a 2025 Android phone without sending data to a server.
Benchmark a Snapdragon 8 Elite Hexagon NPU against CPU and GPU paths for a real diffusion transformer workload.
Use as a reference for porting other transformer models to run on Qualcomm's Hexagon NPU via QNN.
| duration-ai/bonsai-image-android | adam-s/car-diagnosis | bongobongo2020/krea2-character-lora-trainer | |
|---|---|---|---|
| Stars | 8 | 8 | 8 |
| Language | Python | Python | Python |
| Setup difficulty | hard | moderate | moderate |
| Complexity | 5/5 | 3/5 | 3/5 |
| Audience | researcher | researcher | vibe coder |
Figures from each repo's GitHub metadata at analysis time.
Requires Qualcomm QAIRT SDK, Android NDK, a device with Hexagon V79, and 10.7 GB of model and binary files not included in the repo.
This repo demonstrates running a compact AI image-generation model entirely on a 2025 Android flagship phone, with no internet connection required. The model produces 512x512 images from text prompts, and all processing happens on the device's specialized chips. The model is called Bonsai Image, built by PrismML from a 4-billion-parameter architecture called FLUX.2 klein, compressed down using a technique called ternary quantization (weights stored as three possible values instead of full floating-point numbers). The image generation happens in three stages: the text prompt is encoded into a numerical representation on the phone's main processor, a diffusion transformer then refines a noisy image over four steps on the phone's dedicated AI chip (Qualcomm's Hexagon NPU), and finally the result is decoded back into a visible image on the CPU again. The Hexagon NPU path is what makes this interesting. Running the transformer on the CPU would take roughly eight to nine minutes per image, which is not practical. Running it on the NPU brings that down to about two minutes and twenty seconds for the full four-step render. The GPU was tested but crashed or faulted at the 512x512 size. Benchmark numbers for all three compute paths are documented in detail. Building this yourself requires the Qualcomm QAIRT SDK, an Android NDK, and a device with a Hexagon V79 chip (a 2025 Snapdragon 8 Elite phone). The process involves exporting the model's 27 individual blocks into compiled binaries for the NPU, cross-compiling a small C++ runner that chains them together on-device, and patching a CPU-side tool to hand off the text encoding and image decoding stages correctly. The total bundle size is about 10.7 GB, most of which is the compiled NPU binaries. This is a companion to the same team's iOS version, which runs the same model on a 2020 iPhone GPU through Apple's MLX framework. Both achieve roughly the same total render time. The repo serves as a technical reference for running large AI models on mobile hardware without cloud infrastructure.
Runs a 4B-parameter AI image generation model entirely on a 2025 Android phone's Hexagon NPU, producing 512x512 images from text prompts with no internet connection.
Mainly Python. The stack also includes Python, C++, Qualcomm QNN.
Apache 2.0 license: use freely for any purpose including commercial use, must include the license notice and NOTICE file.
Setup difficulty is rated hard, with roughly 1day+ to a first successful run.
Mainly researcher.
This repo across BitVibe Labs
Verify against the repo before relying on details.