explaingit

shengqiangzhang/examples-of-web-crawlers

14,629HTMLAudience · developerComplexity · 2/5LicenseSetup · moderate

TLDR

A beginner-friendly collection of Python web scraping scripts that automatically collect data from websites, including Chinese shopping sites, stock platforms, and social apps like WeChat and QQ.

Mindmap

mindmap
  root((repo))
    What it does
      Scrape websites
      Collect data
      Automate login
    Tech stack
      Python
      Selenium
      Chrome
    Use cases
      E-commerce data
      Stock data
      Social app reports
    Audience
      Beginners
      Data collectors
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Scrape product listings or rankings from Chinese e-commerce sites like Taobao or Tmall automatically.

USE CASE 2

Collect stock and mutual fund data from financial sites using multiple threads and rotating IP addresses to avoid blocks.

USE CASE 3

Generate a summary report from your WeChat contacts or QQ chat history with a single Python script.

USE CASE 4

Download high-resolution wallpapers automatically from a wallpaper app.

Tech stack

PythonSeleniumChrome

Getting it running

Difficulty · moderate Time to first run · 30min

Some examples require ChromeDriver for Selenium and personal account credentials, the README is primarily in Chinese and may need translation.

MIT license, use, modify, and share freely, including in commercial projects.

In plain English

This repository is a collection of Python web crawling examples aimed at beginners. Web crawling means writing code that automatically visits websites and collects data from them, much like how a person would open a page, read the information, and copy it down, except the program does it automatically at scale. The code examples here are described as beginner-friendly, with heavy commenting to explain each step. The collection covers more than a dozen separate projects, each targeting a specific task. Several examples focus on Chinese platforms such as Taobao, Tmall, and Douban, which are major Chinese shopping and entertainment sites. For those examples, the scripts use a tool called Selenium, which controls a real Chrome browser window so that the code can log in and navigate pages that would otherwise block automated access. Other examples include: downloading high-resolution wallpapers from a Mac wallpaper app, scraping movie rankings from Douban (a Chinese film review site), collecting mutual fund and stock data from a financial site using multiple threads and a pool of rotating IP addresses to avoid being blocked, generating a personal report from your WeChat contact list, and generating a historical summary report from your QQ account. There is also a script that sends scheduled reminder messages to a contact via WeChat at set times each day. Most examples follow the same setup pattern: install a few Python packages listed in a requirements file, optionally download a Chrome browser driver if Selenium is needed, fill in your account credentials in the script, and then run a single Python file. Some projects include animated screenshots in the README showing the program in action. The README is written primarily in Chinese, but the code itself and the project structure are straightforward enough that the steps can be followed with the help of a translation tool. The project is licensed under the MIT license.

Copy-paste prompts

Prompt 1
Help me adapt the Taobao scraping example from this repo to download product prices and titles into a CSV file.
Prompt 2
I want to use the Selenium-based crawler from examples-of-web-crawlers to log into a site. Walk me through setting up ChromeDriver correctly.
Prompt 3
Show me how to use the multi-threaded scraping example in this repo to collect data faster with a rotating IP proxy pool.
Prompt 4
Help me modify the scheduled WeChat reminder script to send a message to a contact every morning at 9am.
Open on GitHub → Explain another repo

← shengqiangzhang on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.