ofa-sys/chinese-clip

★ 5,900Jupyter NotebookAudience · researcherComplexity · 3/5Setup · moderate

Mindmap

mindmap
  root((Chinese-CLIP))
    What it does
      Image-text matching
      Zero-shot classification
      Feature extraction
    Model Sizes
      77M parameters smallest
      958M parameters largest
      5 size options
    Use Cases
      Chinese image search
      Product classification
      Research benchmarks
    Tech Stack
      Python
      PyTorch
      ONNX
      TensorRT
    Setup
      Hugging Face weights
      ModelScope option
      Python API

mindmap root((Chinese-CLIP)) What it does Image-text matching Zero-shot classification Feature extraction Model Sizes 77M parameters smallest 958M parameters largest 5 size options Use Cases Chinese image search Product classification Research benchmarks Tech Stack Python PyTorch ONNX TensorRT Setup Hugging Face weights ModelScope option Python API

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Build a Chinese-language image search engine that retrieves photos matching a text query written in Chinese

USE CASE 2

Classify product images into categories described in plain Chinese without collecting labeled training data

USE CASE 3

Fine-tune the model on a specialized Chinese image-text dataset for a domain-specific retrieval task

USE CASE 4

Export the model to ONNX or TensorRT for fast inference in a production service

Tech stack

PythonPyTorchONNXTensorRTJupyter Notebook

Getting it running

Difficulty · moderate Time to first run · 30min

Requires downloading large model weights from Hugging Face or ModelScope, larger models need significant GPU memory.

In plain English

Chinese-CLIP is an AI model trained to understand the relationship between images and Chinese text. You give it an image and a set of Chinese captions, and it can tell you which caption best matches the image, or vice versa. It can also classify images into categories described in plain Chinese, without needing to be specifically trained on those categories first. The project is a Chinese-language adaptation of the original CLIP model (from OpenAI), retrained from scratch using roughly 200 million Chinese image-text pairs. The goal was to produce a model that handles Chinese queries accurately, since the English CLIP model performs poorly on Chinese input. The code builds on the open_clip project and adds optimizations for Chinese data. Five model sizes are available, ranging from 77 million parameters (RN50) up to 958 million parameters (ViT-H/14). Larger models generally produce better retrieval scores but require more memory to run. Pre-trained weights can be downloaded from Hugging Face or from ModelScope, a Chinese model hosting platform. The models are also integrated into the Hugging Face transformers library, so they can be loaded with a few lines of standard Python code. The package provides a Python API for computing image and text features and measuring how similar they are to each other. It also includes training code for fine-tuning on new datasets, zero-shot image classification scripts, and tools for exporting models to ONNX and TensorRT formats for faster inference in production environments. The technical paper is available on arXiv and benchmark results on several Chinese retrieval datasets are reported in the repository. The project was developed by the OFA-Sys team.

Copy-paste prompts

Prompt 1

Using Chinese-CLIP, write Python code to load the ViT-B/16 model from Hugging Face and find the best-matching image from a folder given a Chinese text query.

Prompt 2

Show me how to run zero-shot image classification with Chinese-CLIP for 10 product categories described in Chinese.

Prompt 3

Help me fine-tune Chinese-CLIP on my own dataset of Chinese product images and descriptions using the provided training scripts.

Prompt 4

Write code to export a Chinese-CLIP model to ONNX and benchmark its inference speed on CPU.

Open on GitHub → Explain another repo

← ofa-sys on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.