Generate photorealistic images from text prompts in Chinese or English using a state-of-the-art diffusion model.
Fine-tune the model on a specific person or art style using LoRA training with a small set of example images.
Fill in or replace a region of an existing image using the inpainting extension.
Create a virtual clothing try-on demo by placing garment images onto a person photo.
Requires a GPU with sufficient VRAM, model weights must be downloaded from Hugging Face or ModelScope before running.
Kolors is a text-to-image AI model built by the Kolors team at Kuaishou, the company behind the short-video platform popular in China. You type a description, and the model generates a photorealistic image matching that description. It was trained on billions of text-image pairs and supports both Chinese and English prompts, which makes it one of the few models with strong performance in both languages. The model is based on a technique called latent diffusion, the same category of approach used by Stable Diffusion and similar systems. What sets it apart, according to its benchmarks, is visual quality and the ability to accurately follow complex descriptions. In a human evaluation by 50 imagery experts comparing it against Adobe Firefly, DALL-E 3, Midjourney v5 and v6, and others, Kolors came out with the highest overall satisfaction score and the highest visual appeal rating. An automated scoring system also placed it first among the same group of models. Beyond basic text-to-image generation, the repository includes several extension tools. IP-Adapter lets you provide a reference image and have the model preserve a face or style from it. ControlNet lets you guide the composition using edge maps or depth maps, so the model follows a rough layout you define. There is also an inpainting model for filling in or replacing specific regions of an existing image, a LoRA training setup for fine-tuning the model on a specific subject or style with a small set of example images, and a virtual try-on demo that places clothing onto a person photo. The code can be run using the standard Diffusers library from Hugging Face, through ComfyUI for a node-based visual workflow, or via a Gradio web interface for quick local testing. Model weights are available on Hugging Face and ModelScope. The repository is released under a license described in its accompanying license file.
← kwai-kolors on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.