borisdayma/dalle-mini

Analysis updated 2026-06-24

★ 14,771PythonAudience · researcherComplexity · 4/5LicenseSetup · hard

Mindmap

mindmap
  root((dalle-mini))
    Inputs
      Text prompts
      VQGAN tokens
    Outputs
      Generated images
      Image tokens
    Use Cases
      Generate images from text
      Train a custom DALL-E mini
      Run inference notebook in Colab
    Tech Stack
      Python
      JAX
      Flax
      Hugging Face
      VQGAN
      BART

mindmap root((dalle-mini)) Inputs Text prompts VQGAN tokens Outputs Generated images Image tokens Use Cases Generate images from text Train a custom DALL-E mini Run inference notebook in Colab Tech Stack Python JAX Flax Hugging Face VQGAN BART

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Run text-to-image generation locally using a pretrained checkpoint

USE CASE 2

Train or fine-tune your own DALL-E mini variant on TPU or GPU

USE CASE 3

Open the Colab inference notebook to test prompts without a GPU

USE CASE 4

Spin up a personal craiyon-style web app via the playground project

What is it built with?

PythonJAXFlaxHugging FaceVQGANBART

How does it compare?

	borisdayma/dalle-mini	powerline/powerline	fauxpilot/fauxpilot
Stars	14,771	14,747	14,741
Language	Python	Python	Python
Setup difficulty	hard	moderate	hard
Complexity	4/5	3/5	4/5
Audience	researcher	developer	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1h+

Inference works on Colab, but full training assumes JAX/TPU experience and pulls heavy checkpoints from Hugging Face.

Apache-2.0: free to use, modify, and distribute with attribution, includes a patent grant.

In plain English

dalle-mini is an open-source effort to recreate DALL-E, OpenAI's original text-to-image model, in a smaller and freely available form. The README is the home page for the project that powers the craiyon.com web app, where anyone can type a prompt and get back generated images. The repo itself holds the model code, training scripts, and inference notebooks. For people who just want to play with the model, the README points at craiyon.com. For developers, there is a Python package: pip install dalle-mini is enough for inference only, and cloning the repo with pip install -e .[dev] sets up a full development environment. There is an inference pipeline notebook in tools/inference that can be opened in Google Colab and stepped through. Training uses tools/train/train.py, and a Weights & Biases sweep configuration file is provided for hyperparameter search. The trained models live on Hugging Face's Model Hub. There are three: a VQGAN-f16-16384 model that encodes and decodes images, and two text-to-image models named DALL-E mini and the larger DALL-E mega. Behind the scenes the system uses an image encoder from the Taming Transformers paper and a sequence-to-sequence model based on BART, with several transformer variants and the Distributed Shampoo optimizer. The README also points readers at community projects: DALL-E Playground for spinning up a personal app, DALL-E Flow for diffusion and upscaling in a human-in-the-loop workflow, and a Replicate hosted version. The project was initially developed by Boris Dayma, Suraj Patil, Pedro Cuenca, and several others, with computing donated by Google's TPU Research Cloud program.

Copy-paste prompts

Prompt 1

Set up dalle-mini inference locally with pip install and run one prompt

Prompt 2

Open the inference notebook in Google Colab and walk through generating four images

Prompt 3

Explain how dalle-mini combines VQGAN with a BART-style seq2seq model

Prompt 4

Configure a Weights and Biases sweep for fine-tuning dalle-mini on a custom dataset

Frequently asked questions

What is dalle-mini?

Open-source text-to-image model (the engine behind craiyon.com) with training scripts, inference notebooks, and pretrained DALL-E mini/mega checkpoints.

What language is dalle-mini written in?

Mainly Python. The stack also includes Python, JAX, Flax.

What license does dalle-mini use?

Apache-2.0: free to use, modify, and distribute with attribution, includes a patent grant.

How hard is dalle-mini to set up?

Setup difficulty is rated hard, with roughly 1h+ to a first successful run.

Who is dalle-mini for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.