Generate images locally with the SANA Linear Diffusion Transformer
Produce 5 second text-to-video clips with SANA-Video
Serve a SANA model through SGLang with an OpenAI-compatible API
Post-train SANA with supervised fine-tuning or RL via Cosmos-RL
Real use needs an NVIDIA GPU, a PyTorch and CUDA toolchain, and downloading multi-gigabyte SANA checkpoints from Hugging Face.
SANA is a codebase from NVIDIA Labs for generating images and short videos from text prompts. The repository contains the training and inference code for a family of related models: SANA, SANA-1.5, SANA-Sprint, SANA-Video, SANA-WM, and Sol-RL. Each one targets a different size, resolution, or use case, and several have been accepted at major machine learning conferences such as ICLR, ICML, and ICCV. The stated focus is efficiency. The original SANA model is described as a Linear Diffusion Transformer, a design meant to keep high resolution image generation fast. SANA-Sprint is a one step diffusion variant aimed at very fast inference. SANA-Video covers text to video and text plus image to video, with a 5 second model and an experimental setup that can stretch generation toward minute long, real time clips. SANA-WM, the most recent addition, is a 2.6B parameter controllable world model that produces 720p, one minute videos with six degree of freedom camera control, pitched as a baseline for world modeling and embodied AI work. The project is wired into a wide ecosystem. There are hosted demo links on Hugging Face and an MIT lab server, an API on Replicate, integration with ComfyUI, serving through SGLang with an OpenAI compatible API, and recipes for post training (supervised fine tuning and reinforcement learning) through Cosmos-RL. Many of the models are also merged into the Hugging Face diffusers library. This particular copy of the repository is a fork under the Sunwood-ai-labs account. The README is mirrored from the upstream NVlabs project and does not describe any fork specific changes, so the content above describes the upstream SANA work it tracks.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.