Build a Chinese-language image search engine that retrieves photos matching a text query written in Chinese
Classify product images into categories described in plain Chinese without collecting labeled training data
Fine-tune the model on a specialized Chinese image-text dataset for a domain-specific retrieval task
Export the model to ONNX or TensorRT for fast inference in a production service
Requires downloading large model weights from Hugging Face or ModelScope, larger models need significant GPU memory.
Chinese-CLIP is an AI model trained to understand the relationship between images and Chinese text. You give it an image and a set of Chinese captions, and it can tell you which caption best matches the image, or vice versa. It can also classify images into categories described in plain Chinese, without needing to be specifically trained on those categories first. The project is a Chinese-language adaptation of the original CLIP model (from OpenAI), retrained from scratch using roughly 200 million Chinese image-text pairs. The goal was to produce a model that handles Chinese queries accurately, since the English CLIP model performs poorly on Chinese input. The code builds on the open_clip project and adds optimizations for Chinese data. Five model sizes are available, ranging from 77 million parameters (RN50) up to 958 million parameters (ViT-H/14). Larger models generally produce better retrieval scores but require more memory to run. Pre-trained weights can be downloaded from Hugging Face or from ModelScope, a Chinese model hosting platform. The models are also integrated into the Hugging Face transformers library, so they can be loaded with a few lines of standard Python code. The package provides a Python API for computing image and text features and measuring how similar they are to each other. It also includes training code for fine-tuning on new datasets, zero-shot image classification scripts, and tools for exporting models to ONNX and TensorRT formats for faster inference in production environments. The technical paper is available on arXiv and benchmark results on several Chinese retrieval datasets are reported in the repository. The project was developed by the OFA-Sys team.
← ofa-sys on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.