explaingit

hiddendevj/crawler_illegal_cases_in_china

4,612HTMLAudience · developerComplexity · 1/5Setup · easy

TLDR

A reference collection of real legal cases, laws, and news about web scraping violations in China, organized by violation type to help developers understand the legal lines before building data collection tools.

Mindmap

mindmap
  root((Scraping Legal Cases China))
    Case Categories
      Illegal service selling
      Personal data scraping
      Commercial data resale
      Server overload attacks
    Laws Referenced
      Criminal Law
      Cybersecurity Law
      Anti-Unfair Competition
      Personal Info Protection
    Audience
      Developers in China
      Legal researchers
      Compliance teams
    Content
      Case summaries
      Law articles
      Lawyer analysis
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Research which scraping behaviors have led to criminal prosecution in China before building a data collection tool.

USE CASE 2

Understand the specific laws, Criminal Law, Cybersecurity Law, Anti-Unfair Competition Law, that apply to scraping personal or commercial data.

USE CASE 3

Identify the thresholds at which scraping activity becomes a criminal offense rather than a civil matter in China.

Tech stack

HTML

Getting it running

Difficulty · easy Time to first run · 5min

The README and most content are written entirely in Chinese.

In plain English

This repository is a reference collection of legal cases, news articles, and relevant laws concerning web scraping violations in mainland China. It is aimed at developers who build data collection tools and want to understand where the legal lines are drawn, so they can avoid crossing into territory that has led to criminal prosecution or civil penalties for others. The cases are grouped into five categories of activity that have resulted in legal trouble. The first involves providing scraping services to organizations engaged in illegal activity, such as selling CAPTCHA-cracking tools. The second covers scraping and selling personal data belonging to individuals, including resumes, social security details, and account credentials. The third involves profiting from data that belongs to a commercial platform, such as reselling scraped listings or charging others for access to a scraping interface. The fourth covers cases where aggressive scraping caused a target server to go down, including a case where a developer and their manager were both convicted after their crawler sent 183 requests per second and brought down a government computing system. A fifth category is listed without a description. Alongside the case summaries, the repository includes a section explaining the specific laws that apply to each type of violation. These draw from China's Criminal Law, the Cybersecurity Law, the Anti-Unfair Competition Law, and civil law statutes covering personal information protection. The descriptions quote specific articles and outline the thresholds at which behavior becomes a criminal offense, for example, illegally obtaining location or financial data on more than fifty people constitutes a serious violation. The repository also links to analysis articles written by lawyers covering the legal risks facing data industry practitioners in China. The README is written entirely in Chinese.

Copy-paste prompts

Prompt 1
Based on the cases in this repo, what types of web scraping have resulted in criminal prosecution in China and what were the typical outcomes?
Prompt 2
I'm building a scraper that collects publicly visible job listings in China. Based on this repo's cases, what legal risks should I be aware of?
Prompt 3
What does China's Cybersecurity Law say about scraping personal data, and what threshold makes it a criminal offense? Reference the specific articles mentioned in this repo.
Prompt 4
What happened in the case where a crawler sent 183 requests per second to a government system and what were the legal consequences for the developer?
Open on GitHub → Explain another repo

← hiddendevj on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.