virtualluoucas/chronicles-ocr

Analysis updated 2026-06-24

★ 116PythonAudience · researcherComplexity · 4/5Setup · hard

Mindmap

mindmap
  root((Chronicles-OCR))
    Inputs
      Script images
      Vision language model
      Task choice
    Outputs
      Character predictions
      Script class
      Benchmark scores
    Use Cases
      Evaluate VLMs on ancient Chinese
      Train models on historical scripts
      Compare open and closed models
    Tech Stack
      Python
      HuggingFace
      Vision Language Models

mindmap root((Chronicles-OCR)) Inputs Script images Vision language model Task choice Outputs Character predictions Script class Benchmark scores Use Cases Evaluate VLMs on ancient Chinese Train models on historical scripts Compare open and closed models Tech Stack Python HuggingFace Vision Language Models

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Evaluate a vision language model on archaic Chinese script reading

USE CASE 2

Train or fine tune a model on the Seven Chinese Scripts dataset

USE CASE 3

Compare open source VLMs against GPT and Gemini on character spotting

USE CASE 4

Score a model on ancient text parsing using edit distance metrics

What is it built with?

PythonHuggingFace

How does it compare?

	virtualluoucas/chronicles-ocr	jackson-video-resources/markov-hedge-fund-method	upload-post/avatar-mix
Stars	116	120	112
Language	Python	Python	Python
Setup difficulty	hard	easy	hard
Complexity	4/5	3/5	4/5
Audience	researcher	developer	vibe coder

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

Requires downloading a 2,800 image dataset and standing up a vision language model with enough VRAM to run inference across four tasks.

In plain English

Chronicles-OCR is a research benchmark, which means it is a fixed collection of test images and scoring rules used to compare how well different AI models can read Chinese writing. The thing that makes it distinctive is that it covers the full historical span of Chinese characters, from the earliest carvings on bones and shells more than three thousand years ago up to brush-and-paper calligraphy in styles still used today. The dataset gathers exactly 2,800 images, split evenly into 400 per script across seven script styles known together as the Seven Chinese Scripts. The README walks through each one: Oracle Bone Script carved on tortoise shells in the Shang dynasty, Bronze Script cast on ceremonial vessels, Seal Script standardised after the Qin unification, Clerical Script which flattened characters and marks the boundary between ancient and modern forms, Regular Script which is the formal style still in use, plus Cursive Script and Running Script which developed for faster informal writing. The first five were each the official script of their era, while the last two are auxiliary styles. The collection was put together with the Key Laboratory of Oracle Bone Inscription Information Processing at Anyang Normal University and with the Palace Museum. The benchmark defines four evaluation tasks. Character Spotting asks a model to point to where each character sits in an image of an archaic script. Fine-grained Archaic Character Recognition asks the model to name each individual character in the older three scripts. Ancient Text Parsing covers all seven scripts and is scored using a string-edit distance to the correct transcription. Script Classification asks the model which of the seven styles a given image belongs to. Each task has its own metric, ranging from F1 with an intersection-over-union threshold to plain accuracy. Most of the README is taken up by a large leaderboard that compares many vision-language models on these tasks. Open-source models listed include several sizes of InternVL, Qwen2.5-VL, Qwen3-VL, Qwen3.5, Gemma 4, MiniCPM-V, Molmo, Ovis2.6, GLM-4.5V, and Kimi K2.5. Proprietary models include GPT-4o, GPT-5, several Seed releases, MiMo-V2-Omni, and Gemini. The numbers are consistently low on archaic-script tasks, showing how hard the benchmark is even for the strongest models. Links to a paper on arXiv and to the dataset on HuggingFace are provided at the top.

Copy-paste prompts

Prompt 1

Walk me through downloading Chronicles-OCR from HuggingFace and running the Character Spotting task on Qwen3-VL

Prompt 2

Show me how the Ancient Text Parsing task computes edit distance and how to plug my own model in

Prompt 3

Help me reproduce the InternVL leaderboard numbers on the Script Classification task

Prompt 4

Sketch a fine tuning pipeline that uses Chronicles-OCR's Oracle Bone subset to improve a small VLM

Prompt 5

Explain the four evaluation tasks in Chronicles-OCR and which one to start with if I only care about Regular Script

Frequently asked questions

What is chronicles-ocr?

Research benchmark of 2,800 images covering the seven historical Chinese scripts, with four tasks and a leaderboard comparing vision-language models on archaic character reading.

What language is chronicles-ocr written in?

Mainly Python. The stack also includes Python, HuggingFace.

How hard is chronicles-ocr to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is chronicles-ocr for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.