tongyi-mai/z-image

★ 11,254PythonAudience · developerComplexity · 4/5Setup · hard

Mindmap

mindmap
  root((repo))
    Model variants
      Z-Image base
      Z-Image-Turbo
      Z-Image-Omni-Base
      Z-Image-Edit
    Capabilities
      Text to image
      Image editing
      Fast generation
    Requirements
      16GB GPU minimum
      Hugging Face weights
    Use cases
      Custom fine tuning
      Photorealistic output
      Chinese text rendering

mindmap root((repo)) Model variants Z-Image base Z-Image-Turbo Z-Image-Omni-Base Z-Image-Edit Capabilities Text to image Image editing Fast generation Requirements 16GB GPU minimum Hugging Face weights Use cases Custom fine tuning Photorealistic output Chinese text rendering

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Generate photorealistic images from text prompts using Z-Image-Turbo on a consumer GPU with 16GB of VRAM

USE CASE 2

Edit existing images using natural-language instructions with the Z-Image-Edit variant

USE CASE 3

Fine-tune the Z-Image-Omni-Base checkpoint to build a custom image generation model for a specific visual style or domain

USE CASE 4

Benchmark Z-Image-Turbo against other open-source text-to-image models using the live demo on Hugging Face Spaces

Tech stack

PythonHugging FaceModelScope

Getting it running

Difficulty · hard Time to first run · 1day+

Requires a GPU with at least 16GB VRAM for the Turbo variant, larger model variants may require significantly more GPU memory.

Released as open-source model weights, check the repository and Hugging Face page for the specific license terms.

In plain English

Z-Image is an AI image generation model family developed by Tongyi-MAI and released in late 2025. Built on a 6-billion-parameter architecture, the family generates images from text descriptions and is available as open-source checkpoints on Hugging Face and ModelScope. The family includes four variants. Z-Image is the foundation model, focused on high visual quality, aesthetic range, and variety across artistic styles, identities, poses, and compositions. It supports negative prompting and is designed to be straightforward to fine-tune for custom applications. Z-Image-Turbo is a faster, distilled version that generates images in under a second on enterprise hardware and fits within 16GB of GPU memory, making it usable on consumer-grade graphics cards. It is optimized for photorealistic output, text rendering in both English and Chinese, and close adherence to written instructions. In a December 2025 benchmark, Z-Image-Turbo ranked as the top open-source model on a third-party text-to-image leaderboard. Z-Image-Omni-Base is a general checkpoint capable of both generating and editing images. It is intended as the most flexible starting point for researchers and developers who want to build custom fine-tuned variants from scratch. Z-Image-Edit is a version specifically fine-tuned for editing existing images based on natural-language instructions. Model weights for Z-Image and Z-Image-Turbo are available on Hugging Face and ModelScope, with live demo spaces provided for both. The Omni-Base and Edit variants were listed as coming soon at the time of the README. A technical report covering the architecture and training process is available on arXiv.

Copy-paste prompts

Prompt 1

How do I run Z-Image-Turbo to generate a photorealistic image from a text prompt on a GPU with 16GB VRAM?

Prompt 2

I want to fine-tune Z-Image-Omni-Base for a custom art style, where do I download the weights and how do I start training?

Prompt 3

How do I use Z-Image-Edit to modify an existing photo using a natural-language instruction like change the background to a sunset?

Prompt 4

What is the difference between Z-Image, Z-Image-Turbo, Z-Image-Omni-Base, and Z-Image-Edit, and which one should I use for generating fast photorealistic output?

Open on GitHub → Explain another repo

← tongyi-mai on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.