Analysis updated 2026-06-24
Run text-to-image generation locally using a pretrained checkpoint
Train or fine-tune your own DALL-E mini variant on TPU or GPU
Open the Colab inference notebook to test prompts without a GPU
Spin up a personal craiyon-style web app via the playground project
| borisdayma/dalle-mini | powerline/powerline | fauxpilot/fauxpilot | |
|---|---|---|---|
| Stars | 14,771 | 14,747 | 14,741 |
| Language | Python | Python | Python |
| Setup difficulty | hard | moderate | hard |
| Complexity | 4/5 | 3/5 | 4/5 |
| Audience | researcher | developer | developer |
Figures from each repo's GitHub metadata at analysis time.
Inference works on Colab, but full training assumes JAX/TPU experience and pulls heavy checkpoints from Hugging Face.
dalle-mini is an open-source effort to recreate DALL-E, OpenAI's original text-to-image model, in a smaller and freely available form. The README is the home page for the project that powers the craiyon.com web app, where anyone can type a prompt and get back generated images. The repo itself holds the model code, training scripts, and inference notebooks. For people who just want to play with the model, the README points at craiyon.com. For developers, there is a Python package: pip install dalle-mini is enough for inference only, and cloning the repo with pip install -e .[dev] sets up a full development environment. There is an inference pipeline notebook in tools/inference that can be opened in Google Colab and stepped through. Training uses tools/train/train.py, and a Weights & Biases sweep configuration file is provided for hyperparameter search. The trained models live on Hugging Face's Model Hub. There are three: a VQGAN-f16-16384 model that encodes and decodes images, and two text-to-image models named DALL-E mini and the larger DALL-E mega. Behind the scenes the system uses an image encoder from the Taming Transformers paper and a sequence-to-sequence model based on BART, with several transformer variants and the Distributed Shampoo optimizer. The README also points readers at community projects: DALL-E Playground for spinning up a personal app, DALL-E Flow for diffusion and upscaling in a human-in-the-loop workflow, and a Replicate hosted version. The project was initially developed by Boris Dayma, Suraj Patil, Pedro Cuenca, and several others, with computing donated by Google's TPU Research Cloud program.
Open-source text-to-image model (the engine behind craiyon.com) with training scripts, inference notebooks, and pretrained DALL-E mini/mega checkpoints.
Mainly Python. The stack also includes Python, JAX, Flax.
Apache-2.0: free to use, modify, and distribute with attribution, includes a patent grant.
Setup difficulty is rated hard, with roughly 1h+ to a first successful run.
Mainly researcher.
This repo across BitVibe Labs
Verify against the repo before relying on details.