explaingit

ymcui/chinese-bert-wwm

10,208PythonAudience · researcherComplexity · 3/5Setup · moderate

TLDR

Pre-trained Chinese BERT language models using Whole Word Masking, ready to load with two lines of Python via HuggingFace Transformers for tasks like text classification, named entity recognition, and question answering.

Mindmap

mindmap
  root((repo))
    What it does
      Chinese NLP models
      Whole Word Masking
      Pre-trained weights
    Model Variants
      BERT-wwm base
      BERT-wwm-ext
      RoBERTa-wwm-ext
      Lightweight 3-layer
    Downstream Tasks
      Text classification
      Named entity recognition
      Question answering
      Sentence similarity
    Setup
      HuggingFace download
      Chinese cloud storage
      Two-line Python load
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Fine-tune a Chinese text classifier for customer review sentiment analysis using the pre-trained weights.

USE CASE 2

Build a Chinese named entity recognition pipeline for news articles by fine-tuning on labeled data.

USE CASE 3

Set up a Chinese question-answering system for a search or support product with HuggingFace Transformers.

USE CASE 4

Use the lightweight 3-layer model variant for fast Chinese sentence similarity scoring in production.

Tech stack

PythonPyTorchHuggingFace TransformersBERT

Getting it running

Difficulty · moderate Time to first run · 30min

Fine-tuning requires a GPU, loading for inference from HuggingFace works with two lines of Python on CPU.

In plain English

This repository provides a set of pre-trained Chinese language models based on BERT, a type of AI model used in natural language processing tasks. The core contribution is the application of a training technique called Whole Word Masking to Chinese text. In the standard BERT training approach, individual characters are randomly hidden and the model learns to predict them. Because Chinese is written without spaces between words, standard masking might hide only part of a multi-character word. Whole Word Masking fixes this by hiding all characters of a word at once, which helps the model learn word-level meaning rather than just character-level patterns. The repository distributes several model variants: the base BERT-wwm model trained on Chinese Wikipedia, an extended version trained on a much larger dataset of 5.4 billion words drawn from Wikipedia, news, and question-answer sources, a larger RoBERTa-based version that uses the same masking technique with additional training improvements, and several smaller 3-layer and 4-layer versions for situations where a lighter model is needed. All models are available to download from HuggingFace (where they can be loaded with two lines of Python using the Transformers library) or from Chinese cloud storage for users in mainland China. The download files include model weights, a configuration file, and a vocabulary list. These models are intended as starting points for downstream tasks such as text classification, named entity recognition, question answering, and sentence similarity. A researcher or developer building a Chinese language application would load one of these models and then train it further on their own labeled data, rather than training a language model from scratch. The README also includes benchmark results across several standard Chinese NLP evaluation sets to show how each variant compares.

Copy-paste prompts

Prompt 1
Load the chinese-bert-wwm-ext model from HuggingFace and fine-tune it on a custom Chinese text classification dataset with 5 categories.
Prompt 2
Using chinese-bert-wwm, build a named entity recognition pipeline to extract person, organization, and location tags from Chinese news articles.
Prompt 3
Set up a Chinese question-answering system using RoBERTa-wwm-ext-large, show me the training and inference code.
Prompt 4
How do I use the 3-layer distilled chinese-bert-wwm model for faster CPU inference with the HuggingFace pipeline API?
Prompt 5
Compare chinese-bert-wwm-base vs chinese-bert-wwm-ext on a sentence similarity task and help me write the evaluation script.
Open on GitHub → Explain another repo

← ymcui on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.