explaingit

ml-gsai/llada

3,781PythonAudience · researcherComplexity · 4/5Setup · hard

TLDR

A research project that trains an 8-billion-parameter language model using a diffusion approach, gradually unmasking a full response, instead of predicting one word at a time like most AI models.

Mindmap

mindmap
  root((LLaDA))
    What it does
      Diffusion language model
      Unmask to generate
      8B parameters
    Models
      Base pretrained
      Instruct tuned
      Vision LLaDA-V
      MoE variant
    Usage
      Terminal chat
      Gradio web UI
      Benchmark eval
    Research
      Masking objective
      Proper probability
      Scales like GPT
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Run the LLaDA-8B-Instruct model to experiment with diffusion-based text generation as an alternative to GPT-style autoregressive models.

USE CASE 2

Launch the Gradio web interface to interact with LLaDA through a browser without writing any code.

USE CASE 3

Evaluate LLaDA on standard language model benchmarks using the lm-evaluation-harness framework.

USE CASE 4

Study the masking-based training objective as a blueprint for building your own diffusion language model.

Tech stack

PythonPyTorchTransformersHugging FaceGradio

Getting it running

Difficulty · hard Time to first run · 1day+

Requires a GPU with enough VRAM to load an 8B model, slower than autoregressive models at inference time due to multi-pass generation.

Open-source research code, check the repository for the exact license terms.

In plain English

LLaDA stands for Large Language Diffusion with mAsking. It is a research project from GSAI at the Chinese Academy of Sciences that trains a large language model using a diffusion approach rather than the autoregressive method that most popular language models use today. The result is an 8-billion-parameter model that the authors say performs comparably to Meta's LLaMA3 8B on a range of benchmarks. Most language models generate text by predicting one token at a time, always moving left to right. LLaDA takes a different route. It starts with a response where every word is masked, or hidden, and then gradually unmasks tokens across multiple steps until the full answer is revealed. The theoretical motivation is that this approach forms a proper generative model with a well-defined probability distribution over text, which the team argues is something BERT-style masked models do not achieve. The training objective is an upper bound on the negative log-likelihood of the model, giving it the mathematical grounding needed to scale and generalize. In practice, you interact with LLaDA much like any other open-weights language model. The pretrained base model and an instruction-tuned variant called LLaDA-8B-Instruct are both available on Hugging Face. Loading them requires the Transformers library. The repo includes scripts for running a chat session in the terminal, launching a Gradio web interface for a visual demo, and evaluating the model on standard benchmarks using the lm-evaluation-harness framework. The project has grown since the original February 2025 paper. A vision-language version called LLaDA-V has been added, along with LLaDA 1.5, which improves preference alignment. A Mixture-of-Experts variant called LLaDA-MoE uses only about one billion active parameters at inference time while reportedly outperforming the dense 8B model on some tasks. One known limitation is sampling speed. Because LLaDA generates a fixed-length response in multiple passes rather than streaming tokens one by one, it is currently slower than autoregressive models and cannot use the KV-Cache optimizations those models rely on. The authors acknowledge this and point to ongoing work in the broader diffusion model community to close the gap.

Copy-paste prompts

Prompt 1
Show me how to load LLaDA-8B-Instruct from Hugging Face and run a chat session with it using the provided terminal script.
Prompt 2
How do I launch the Gradio demo for LLaDA on my local machine so I can test it through a browser interface?
Prompt 3
Walk me through running LLaDA on the standard lm-evaluation-harness benchmarks and comparing the results to LLaMA3 8B.
Prompt 4
Explain the LLaDA masking-based training objective, how does starting from a fully masked response and unmasking tokens differ from autoregressive generation?
Prompt 5
What hardware do I need to run LLaDA-8B inference, and are there any quantised versions available to reduce memory requirements?
Open on GitHub → Explain another repo

← ml-gsai on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.