Generate short videos from text prompts at 720p using the MOVA model
Create music from text descriptions using the ACE-Step audio generation model
Edit an existing image by typing a natural-language instruction using JoyAI-Image
Train a controllable image generation model using the Diffusion Templates plugin framework
Requires GPU hardware with sufficient VRAM, project is experimental so APIs may change and issue response times can be slow.
DiffSynth-Studio is a Python library and engine for working with diffusion models, which are a type of AI system capable of generating images, videos, audio, and music from text prompts or other inputs. The project is maintained by the ModelScope Community and positions itself as an experimental playground oriented toward researchers and developers who want to explore what these generative models can do. The codebase splits into two separate projects. DiffSynth-Studio is the experimental branch, where new model types and techniques get added quickly, sometimes at the cost of stability. DiffSynth-Engine is the companion project aimed at production deployment, offering more consistent behavior and higher performance. If you want to experiment with the newest AI generation capabilities, Studio is the entry point, if you want to ship something reliable, Engine is the intended path. The range of supported models is broad. Recent additions include text-to-music generation via ACE-Step, video generation at 360p and 720p via MOVA, instruction-guided image editing via JoyAI-Image, and audio-video generation via LTX-2. Earlier models like Stable Diffusion 1.5 and SDXL are also supported for academic purposes. The project also introduced a Diffusion Templates framework in early 2026, described as a plugin system for training controllable generative models with lower setup overhead. The team is small, mainly two contributors, which the README explicitly acknowledges. New features come in regularly but issue response times can be slow. For anyone using this as a dependency in a real project, that is worth knowing up front. Documentation exists in both English and Chinese, and there is a Discord community for questions. Getting started requires Python and GPU hardware. The package installs via pip. Example scripts and per-model documentation are organized in the repo under the examples and docs directories. The full README is longer than what was shown.
← modelscope on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.