kwai-kolors/kolors

★ 4,614PythonAudience · developerComplexity · 4/5Setup · hard

Mindmap

mindmap
  root((Kolors))
    What it does
      Text to image
      Chinese and English
      Photorealistic output
    Extensions
      IP-Adapter face preserve
      ControlNet layout guide
      Inpainting regions
      LoRA fine-tuning
      Virtual try-on
    Interfaces
      Diffusers library
      ComfyUI workflow
      Gradio web UI
    Audience
      AI researchers
      App developers
      Creative coders

mindmap root((Kolors)) What it does Text to image Chinese and English Photorealistic output Extensions IP-Adapter face preserve ControlNet layout guide Inpainting regions LoRA fine-tuning Virtual try-on Interfaces Diffusers library ComfyUI workflow Gradio web UI Audience AI researchers App developers Creative coders

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Generate photorealistic images from text prompts in Chinese or English using a state-of-the-art diffusion model.

USE CASE 2

Fine-tune the model on a specific person or art style using LoRA training with a small set of example images.

USE CASE 3

Fill in or replace a region of an existing image using the inpainting extension.

USE CASE 4

Create a virtual clothing try-on demo by placing garment images onto a person photo.

Tech stack

PythonPyTorchDiffusersComfyUIGradio

Getting it running

Difficulty · hard Time to first run · 1h+

Requires a GPU with sufficient VRAM, model weights must be downloaded from Hugging Face or ModelScope before running.

Governed by a separate license file included in the repository, the README does not describe the terms.

In plain English

Kolors is a text-to-image AI model built by the Kolors team at Kuaishou, the company behind the short-video platform popular in China. You type a description, and the model generates a photorealistic image matching that description. It was trained on billions of text-image pairs and supports both Chinese and English prompts, which makes it one of the few models with strong performance in both languages. The model is based on a technique called latent diffusion, the same category of approach used by Stable Diffusion and similar systems. What sets it apart, according to its benchmarks, is visual quality and the ability to accurately follow complex descriptions. In a human evaluation by 50 imagery experts comparing it against Adobe Firefly, DALL-E 3, Midjourney v5 and v6, and others, Kolors came out with the highest overall satisfaction score and the highest visual appeal rating. An automated scoring system also placed it first among the same group of models. Beyond basic text-to-image generation, the repository includes several extension tools. IP-Adapter lets you provide a reference image and have the model preserve a face or style from it. ControlNet lets you guide the composition using edge maps or depth maps, so the model follows a rough layout you define. There is also an inpainting model for filling in or replacing specific regions of an existing image, a LoRA training setup for fine-tuning the model on a specific subject or style with a small set of example images, and a virtual try-on demo that places clothing onto a person photo. The code can be run using the standard Diffusers library from Hugging Face, through ComfyUI for a node-based visual workflow, or via a Gradio web interface for quick local testing. Model weights are available on Hugging Face and ModelScope. The repository is released under a license described in its accompanying license file.

Copy-paste prompts

Prompt 1

Using the Kolors model via the Hugging Face Diffusers library, write a Python script that takes a text prompt and saves the generated image to disk.

Prompt 2

Show me how to set up the Kolors IP-Adapter to generate an image that preserves a face from a reference photo while following a new text prompt.

Prompt 3

I want to run Kolors locally with a Gradio web UI. Walk me through the setup steps and the launch command.

Prompt 4

How do I use Kolors ControlNet with an edge map to guide the composition of a generated image? Show me the Python code.

Prompt 5

Write a script to fine-tune Kolors on 10 images of my product using the included LoRA training setup.

Open on GitHub → Explain another repo

← kwai-kolors on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.