sunwood-ai-labs/sana

Analysis updated 2026-06-24

★ 0Audience · researcherComplexity · 5/5Setup · hard

Mindmap

mindmap
  root((Sana))
    Inputs
      Text prompts
      Reference images
      Camera controls
    Outputs
      High-res images
      Short videos
      World model clips
    Use Cases
      Generate 1024px images
      Make 5s text-to-video clips
      Serve a SANA API
      Fine-tune SANA variants
    Tech Stack
      Python
      PyTorch
      Diffusers
      SGLang
      ComfyUI

mindmap root((Sana)) Inputs Text prompts Reference images Camera controls Outputs High-res images Short videos World model clips Use Cases Generate 1024px images Make 5s text-to-video clips Serve a SANA API Fine-tune SANA variants Tech Stack Python PyTorch Diffusers SGLang ComfyUI

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Generate images locally with the SANA Linear Diffusion Transformer

USE CASE 2

Produce 5 second text-to-video clips with SANA-Video

USE CASE 3

Serve a SANA model through SGLang with an OpenAI-compatible API

USE CASE 4

Post-train SANA with supervised fine-tuning or RL via Cosmos-RL

What is it built with?

PythonPyTorchDiffusersSGLangComfyUICUDA

How does it compare?

	sunwood-ai-labs/sana	0xhassaan/nn-from-scratch	0xzgbot/hermes-comfyui-skills
Stars	0	0	0
Language	—	Python	—
Setup difficulty	hard	moderate	easy
Complexity	5/5	4/5	1/5
Audience	researcher	developer	designer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

Real use needs an NVIDIA GPU, a PyTorch and CUDA toolchain, and downloading multi-gigabyte SANA checkpoints from Hugging Face.

In plain English

SANA is a codebase from NVIDIA Labs for generating images and short videos from text prompts. The repository contains the training and inference code for a family of related models: SANA, SANA-1.5, SANA-Sprint, SANA-Video, SANA-WM, and Sol-RL. Each one targets a different size, resolution, or use case, and several have been accepted at major machine learning conferences such as ICLR, ICML, and ICCV. The stated focus is efficiency. The original SANA model is described as a Linear Diffusion Transformer, a design meant to keep high resolution image generation fast. SANA-Sprint is a one step diffusion variant aimed at very fast inference. SANA-Video covers text to video and text plus image to video, with a 5 second model and an experimental setup that can stretch generation toward minute long, real time clips. SANA-WM, the most recent addition, is a 2.6B parameter controllable world model that produces 720p, one minute videos with six degree of freedom camera control, pitched as a baseline for world modeling and embodied AI work. The project is wired into a wide ecosystem. There are hosted demo links on Hugging Face and an MIT lab server, an API on Replicate, integration with ComfyUI, serving through SGLang with an OpenAI compatible API, and recipes for post training (supervised fine tuning and reinforcement learning) through Cosmos-RL. Many of the models are also merged into the Hugging Face diffusers library. This particular copy of the repository is a fork under the Sunwood-ai-labs account. The README is mirrored from the upstream NVlabs project and does not describe any fork specific changes, so the content above describes the upstream SANA work it tracks.

Copy-paste prompts

Prompt 1

Walk me through running the SANA inference script on a single image prompt with a sensible default config

Prompt 2

Show me how to point ComfyUI at the SANA checkpoints from this repo

Prompt 3

Help me launch SANA through SGLang and hit it from the OpenAI Python client

Prompt 4

Explain how SANA-Sprint reaches one step inference and what trade-offs that brings

Prompt 5

Set up a small supervised fine-tune of SANA on my own image-caption pairs using the Cosmos-RL recipes

Frequently asked questions

What is sana?

A fork of NVIDIA's SANA repo with training and inference code for a family of efficient text-to-image and text-to-video diffusion models, including a 2.6B world model with camera control.

How hard is sana to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is sana for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.