explaingit

stability-ai/generative-models

27,136PythonAudience · developerComplexity · 4/5LicenseSetup · hard

TLDR

AI models from Stability AI that generate 3D videos and multi-view imagery from single images or videos using diffusion techniques.

Mindmap

mindmap
  root((repo))
    What it does
      Video to 3D views
      Image to 360 video
      Novel view synthesis
    Models included
      SV4D 2.0
      SV3D
      Stable Video Diffusion
    How to use
      Download from HuggingFace
      Run locally on GPU
      Python-based
    Use cases
      3D asset generation
      Creative tool building
      AI video research
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Generate 3D object views from a single video by rendering the same object from multiple camera angles.

USE CASE 2

Create 360-degree orbital videos around objects captured in a single still image.

USE CASE 3

Build creative applications that turn 2D images or videos into multi-view 3D-like experiences.

Tech stack

PythonPyTorchCUDAHuggingFaceDiffusion models

Getting it running

Difficulty · hard Time to first run · 1h+

Requires CUDA-capable GPU, large model downloads, and PyTorch/CUDA environment setup.

Use freely for any purpose including commercial, as long as you keep the copyright notice.

In plain English

This repository, Generative Models by Stability AI, is the home for a series of research models that generate visual content from images and short videos. The README walks through releases by date. Stable Video 4D 2.0, or SV4D 2.0, is described as a video-to-4D diffusion model: it takes a short input video of a moving object and produces novel-view videos that look like the same scene filmed from other camera angles. The earlier Stable Video 4D and Stable Video 3D models are also documented, SV3D is described as an image-to-video model for generating multiple synthetic views from a single picture. These are diffusion models, the family of generative AI systems that produce images or videos by gradually refining noise into a coherent output guided by an input. The README gives practical numbers for SV4D 2.0: it generates 48 frames (12 video frames across 4 camera views) at 576-by-576 resolution from a 12-frame input, ideally clean white-background footage of a single moving object. Longer outputs are produced by running the model in steps and feeding earlier results back in. Sampling scripts accept a gif or mp4 file, a folder of frames, or a filename pattern, download weights from Hugging Face, and write generated frames to an output folder. Options cover sampling steps, camera elevation, background removal, and running on cards with less memory. Someone would use this repository for research in synthesizing new views of objects from limited footage, for example multi-view content generation or 4D asset creation. The README marks the releases as for research purposes. It is written in Python and uses PyTorch with CUDA.

Copy-paste prompts

Prompt 1
How do I set up SV4D 2.0 from this repo to convert a video of an object into multi-view 3D renders?
Prompt 2
Show me the code to download and run SV3D locally to generate a 360-degree video from a single image.
Prompt 3
What GPU memory do I need to run these Stability AI models, and how do I optimize for faster generation?
Open on GitHub → Explain another repo

← stability-ai on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.