explaingit

stability-ai/generative-models

27,136PythonAudience · developerComplexity · 4/5MaintainedLicenseSetup · hard

TLDR

AI models from Stability AI that generate 3D videos and multi-view imagery from single images or videos using diffusion techniques.

Mindmap

mindmap
  root((repo))
    What it does
      Video to 3D views
      Image to 360 video
      Novel view synthesis
    Models included
      SV4D 2.0
      SV3D
      Stable Video Diffusion
    How to use
      Download from HuggingFace
      Run locally on GPU
      Python-based
    Use cases
      3D asset generation
      Creative tool building
      AI video research

Things people build with this

USE CASE 1

Generate 3D object views from a single video by rendering the same object from multiple camera angles.

USE CASE 2

Create 360-degree orbital videos around objects captured in a single still image.

USE CASE 3

Build creative applications that turn 2D images or videos into multi-view 3D-like experiences.

Tech stack

PythonPyTorchCUDAHuggingFaceDiffusion models

Getting it running

Difficulty · hard Time to first run · 1h+

Requires CUDA-capable GPU, large model downloads, and PyTorch/CUDA environment setup.

Use freely for any purpose including commercial, as long as you keep the copyright notice.

In plain English

This repository, called "Generative Models by Stability AI," holds the official code and model checkpoints for a series of research generative models released by Stability AI. Generative models are AI systems that produce new content, in this case, mostly images and short videos, from inputs like single images or short clips. The repository serves as the place where researchers and developers download the models, run sample scripts, and reproduce or extend the results. The README walks through several specific releases. The most recent is Stable Video 4D 2.0, a video-to-4D model: given a short input video of an object, it synthesises new views of that object from different camera angles, producing a video that looks as if multiple cameras had recorded the same moving scene. An earlier version, Stable Video 4D, did the same job with different frame and view counts. Before that, SV3D took a single image of an object and produced a short orbital video of it from new viewpoints. For each release the README provides a quickstart command, instructions for downloading the model weights from Hugging Face, options for changing the number of sampling steps, the input video length, and the rendered camera elevation, plus tips for removing the background from real-world input videos and for running on GPUs with limited VRAM via smaller batch sizes or lower resolution. You would use this for research or experiments in novel-view synthesis or generative video, for example, turning a single product photo into a turntable video, or generating new camera angles of a captured clip. The code is written in Python. The full README is longer than what was provided.

Copy-paste prompts

Prompt 1
How do I set up SV4D 2.0 from this repo to convert a video of an object into multi-view 3D renders?
Prompt 2
Show me the code to download and run SV3D locally to generate a 360-degree video from a single image.
Prompt 3
What GPU memory do I need to run these Stability AI models, and how do I optimize for faster generation?
Open on GitHub → Explain another repo

Generated 2026-05-21 · Model: sonnet-4-6 · Verify against the repo before relying on details.