explaingit

virtualluoucas/chronicles-ocr

117PythonAudience · researcherComplexity · 4/5ActiveSetup · hard

TLDR

Research benchmark of 2,800 images covering the seven historical Chinese scripts, with four tasks and a leaderboard comparing vision-language models on archaic character reading.

Mindmap

mindmap
  root((Chronicles-OCR))
    Inputs
      Script images
      Vision language model
      Task choice
    Outputs
      Character predictions
      Script class
      Benchmark scores
    Use Cases
      Evaluate VLMs on ancient Chinese
      Train models on historical scripts
      Compare open and closed models
    Tech Stack
      Python
      HuggingFace
      Vision Language Models

Things people build with this

USE CASE 1

Evaluate a vision language model on archaic Chinese script reading

USE CASE 2

Train or fine tune a model on the Seven Chinese Scripts dataset

USE CASE 3

Compare open source VLMs against GPT and Gemini on character spotting

USE CASE 4

Score a model on ancient text parsing using edit distance metrics

Tech stack

PythonHuggingFace

Getting it running

Difficulty · hard Time to first run · 1day+

Requires downloading a 2,800 image dataset and standing up a vision language model with enough VRAM to run inference across four tasks.

In plain English

Chronicles-OCR is a research benchmark, which means it is a fixed collection of test images and scoring rules used to compare how well different AI models can read Chinese writing. The thing that makes it distinctive is that it covers the full historical span of Chinese characters, from the earliest carvings on bones and shells more than three thousand years ago up to brush-and-paper calligraphy in styles still used today. The dataset gathers exactly 2,800 images, split evenly into 400 per script across seven script styles known together as the Seven Chinese Scripts. The README walks through each one: Oracle Bone Script carved on tortoise shells in the Shang dynasty, Bronze Script cast on ceremonial vessels, Seal Script standardised after the Qin unification, Clerical Script which flattened characters and marks the boundary between ancient and modern forms, Regular Script which is the formal style still in use, plus Cursive Script and Running Script which developed for faster informal writing. The first five were each the official script of their era, while the last two are auxiliary styles. The collection was put together with the Key Laboratory of Oracle Bone Inscription Information Processing at Anyang Normal University and with the Palace Museum. The benchmark defines four evaluation tasks. Character Spotting asks a model to point to where each character sits in an image of an archaic script. Fine-grained Archaic Character Recognition asks the model to name each individual character in the older three scripts. Ancient Text Parsing covers all seven scripts and is scored using a string-edit distance to the correct transcription. Script Classification asks the model which of the seven styles a given image belongs to. Each task has its own metric, ranging from F1 with an intersection-over-union threshold to plain accuracy. Most of the README is taken up by a large leaderboard that compares many vision-language models on these tasks. Open-source models listed include several sizes of InternVL, Qwen2.5-VL, Qwen3-VL, Qwen3.5, Gemma 4, MiniCPM-V, Molmo, Ovis2.6, GLM-4.5V, and Kimi K2.5. Proprietary models include GPT-4o, GPT-5, several Seed releases, MiMo-V2-Omni, and Gemini. The numbers are consistently low on archaic-script tasks, showing how hard the benchmark is even for the strongest models. Links to a paper on arXiv and to the dataset on HuggingFace are provided at the top.

Copy-paste prompts

Prompt 1
Walk me through downloading Chronicles-OCR from HuggingFace and running the Character Spotting task on Qwen3-VL
Prompt 2
Show me how the Ancient Text Parsing task computes edit distance and how to plug my own model in
Prompt 3
Help me reproduce the InternVL leaderboard numbers on the Script Classification task
Prompt 4
Sketch a fine tuning pipeline that uses Chronicles-OCR's Oracle Bone subset to improve a small VLM
Prompt 5
Explain the four evaluation tasks in Chronicles-OCR and which one to start with if I only care about Regular Script
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.