explaingit

borisdayma/dalle-mini

Analysis updated 2026-06-24

14,771PythonAudience · researcherComplexity · 4/5LicenseSetup · hard

TLDR

Open-source text-to-image model (the engine behind craiyon.com) with training scripts, inference notebooks, and pretrained DALL-E mini/mega checkpoints.

Mindmap

mindmap
  root((dalle-mini))
    Inputs
      Text prompts
      VQGAN tokens
    Outputs
      Generated images
      Image tokens
    Use Cases
      Generate images from text
      Train a custom DALL-E mini
      Run inference notebook in Colab
    Tech Stack
      Python
      JAX
      Flax
      Hugging Face
      VQGAN
      BART
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Run text-to-image generation locally using a pretrained checkpoint

USE CASE 2

Train or fine-tune your own DALL-E mini variant on TPU or GPU

USE CASE 3

Open the Colab inference notebook to test prompts without a GPU

USE CASE 4

Spin up a personal craiyon-style web app via the playground project

What is it built with?

PythonJAXFlaxHugging FaceVQGANBART

How does it compare?

borisdayma/dalle-minipowerline/powerlinefauxpilot/fauxpilot
Stars14,77114,74714,741
LanguagePythonPythonPython
Setup difficultyhardmoderatehard
Complexity4/53/54/5
Audienceresearcherdeveloperdeveloper

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1h+

Inference works on Colab, but full training assumes JAX/TPU experience and pulls heavy checkpoints from Hugging Face.

Apache-2.0: free to use, modify, and distribute with attribution, includes a patent grant.

In plain English

dalle-mini is an open-source effort to recreate DALL-E, OpenAI's original text-to-image model, in a smaller and freely available form. The README is the home page for the project that powers the craiyon.com web app, where anyone can type a prompt and get back generated images. The repo itself holds the model code, training scripts, and inference notebooks. For people who just want to play with the model, the README points at craiyon.com. For developers, there is a Python package: pip install dalle-mini is enough for inference only, and cloning the repo with pip install -e .[dev] sets up a full development environment. There is an inference pipeline notebook in tools/inference that can be opened in Google Colab and stepped through. Training uses tools/train/train.py, and a Weights & Biases sweep configuration file is provided for hyperparameter search. The trained models live on Hugging Face's Model Hub. There are three: a VQGAN-f16-16384 model that encodes and decodes images, and two text-to-image models named DALL-E mini and the larger DALL-E mega. Behind the scenes the system uses an image encoder from the Taming Transformers paper and a sequence-to-sequence model based on BART, with several transformer variants and the Distributed Shampoo optimizer. The README also points readers at community projects: DALL-E Playground for spinning up a personal app, DALL-E Flow for diffusion and upscaling in a human-in-the-loop workflow, and a Replicate hosted version. The project was initially developed by Boris Dayma, Suraj Patil, Pedro Cuenca, and several others, with computing donated by Google's TPU Research Cloud program.

Copy-paste prompts

Prompt 1
Set up dalle-mini inference locally with pip install and run one prompt
Prompt 2
Open the inference notebook in Google Colab and walk through generating four images
Prompt 3
Explain how dalle-mini combines VQGAN with a BART-style seq2seq model
Prompt 4
Configure a Weights and Biases sweep for fine-tuning dalle-mini on a custom dataset

Frequently asked questions

What is dalle-mini?

Open-source text-to-image model (the engine behind craiyon.com) with training scripts, inference notebooks, and pretrained DALL-E mini/mega checkpoints.

What language is dalle-mini written in?

Mainly Python. The stack also includes Python, JAX, Flax.

What license does dalle-mini use?

Apache-2.0: free to use, modify, and distribute with attribution, includes a patent grant.

How hard is dalle-mini to set up?

Setup difficulty is rated hard, with roughly 1h+ to a first successful run.

Who is dalle-mini for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.