explaingit

doggy8088/opencc-py

11PythonAudience · developerComplexity · 2/5ActiveLicenseSetup · easy

TLDR

Pure Python library that converts Chinese text between Mainland Simplified, Hong Kong Traditional, Taiwan Traditional, and Japanese new-form characters. Ships with dictionaries, a CLI, and an HTML/XML converter.

Mindmap

mindmap
  root((opencc-py))
    Inputs
      Chinese text strings
      HTML or XML files
      Custom dictionary files
    Outputs
      Converted text
      Rewritten HTML
      CLI-edited files
    Use Cases
      Localize content for TW or HK
      Batch convert files
      Apply custom term rules
    Tech Stack
      Python
      ElementTree
      pip

Things people build with this

USE CASE 1

Convert a Simplified Chinese article to Taiwan Traditional for a regional site

USE CASE 2

Batch convert a folder of Markdown files with the opencc-py CLI

USE CASE 3

Add custom term rules on top of a built-in locale via converter_factory

USE CASE 4

Localize an HTML page while skipping script, style, and ignored elements

Tech stack

PythonElementTreepip

Getting it running

Difficulty · easy Time to first run · 5min

Requires Python 3.11 or newer, otherwise no extra setup.

MIT license, use freely in commercial and personal projects with attribution.

In plain English

opencc-py is a pure Python library for converting Chinese text between different regional variants such as Mainland Simplified, Hong Kong Traditional, Taiwan Traditional, and Japanese new-form characters. The author ported it from an earlier C# implementation, keeping the same dictionaries, locale and preset definitions, longest-match lookup approach, and multi-stage conversion flow as the original. The package targets Python 3.11 or newer and has no runtime dependencies on other packages. It ships with built-in dictionaries for six locales, written as short codes: cn for Mainland Simplified, hk for Hong Kong Traditional, tw for Taiwan Traditional, tw2 for the Taiwan everyday-words variant, twp for Taiwan with extra IT terms and personal names, and jp for Japanese characters. There is also a pass-through code t that skips dictionary loading for that stage. Three presets are available: full, cn2t for Simplified to Traditional, and t2cn for the reverse. Install with pip install opencc-py-tw2. The basic usage is to call converter(source_locale, target_locale) which returns an object you can call on a string, for example converter("cn", "tw2")("a sentence in simplified characters") returns the Taiwan-form output. Users can also pass their own dictionaries. The custom dictionary string format matches the C# version: each entry is source then target, entries separated by a pipe character, and a tab can be used when the source or target contains a space. The README also documents a converter_factory function that chains multiple DictGroup objects in order, useful when you want to apply your own rules on top of a built-in locale. There is an HTML and XML converter that works through Python's standard xml.etree.ElementTree, converting text inside elements whose lang attribute matches the requested range, plus meta description and keywords content, image alt attributes, and button input values, while skipping script and style tags and any element with the ignore-opencc class. A command-line tool called opencc-py converts a file given source and target locales, with optional output path or in-place editing. The project is MIT licensed.

Copy-paste prompts

Prompt 1
Write a Python script using opencc-py to convert a folder of .txt files from cn to tw2 and save them next to the originals
Prompt 2
Show me how to build a custom DictGroup in opencc-py that overrides three product names then chains it after the cn to tw preset
Prompt 3
Give me the opencc-py CLI command to convert an HTML file in-place from Simplified to Hong Kong Traditional
Prompt 4
Explain how to add the ignore-opencc class to a div so opencc-py skips it during HTML conversion
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.