explaingit

wistbean/learn_python3_spider

📈 Trending21,629PythonAudience · developerComplexity · 2/5ActiveLicenseSetup · easy

TLDR

A Chinese-language tutorial series teaching web scraping with Python 3, from basics to advanced techniques like handling logins, CAPTCHAs, and distributed scrapers.

Mindmap

mindmap
  root((repo))
    What it does
      Teaches web scraping
      Automates data collection
      Covers Python 3
    Topics covered
      Network traffic inspection
      Page parsing libraries
      Login handling
      CAPTCHA bypass
      Anti-scraping measures
      Mobile app automation
      Database storage
      Distributed scrapers
    Learning path
      Beginner concepts
      Intermediate techniques
      Advanced scenarios
    Tools and libraries
      Fiddler
      mitmproxy
      Python libraries
    Use cases
      Learning web scraping
      Building data collectors
      Understanding automation

Things people build with this

USE CASE 1

Learn how to automatically collect data from websites and apps using Python from scratch.

USE CASE 2

Build a web scraper that handles logins, CAPTCHAs, and anti-scraping protections.

USE CASE 3

Set up a distributed scraping system that runs across multiple servers to gather large amounts of data.

USE CASE 4

Understand how to inspect network traffic and parse web page content programmatically.

Tech stack

Python 3Fiddlermitmproxy

Getting it running

Difficulty · easy Time to first run · 5min
Use freely for any purpose including commercial, as long as you keep the copyright notice.

In plain English

This repository is a Chinese-language tutorial series that teaches Python web scraping from absolute scratch. Web scraping means writing a program that automatically visits web pages or mobile apps and pulls data out of them. The series is presented as a curated reading list: the README is essentially a table of contents, with each entry linking out to a full article hosted on WeChat or a separate blog. The value is the structured curriculum. The curriculum walks through the topic in order. It starts with how to inspect the traffic a browser or mobile app sends and receives, using packet-capture tools like Fiddler and mitmproxy. It then introduces Python libraries used to fetch pages and pull information out of them, including urllib, requests, BeautifulSoup, and selenium, and shows how to use selenium with phantomJS to drive a browser. Later articles cover handling login pages, recognising image-based verification codes, defeating anti-scraping tricks like CSS-based font encryption and JavaScript obfuscation, scraping mobile apps with Appium, running scrapers across multiple threads and processes, using IP proxy pools to avoid being blocked, saving results into CSV files or MySQL and MongoDB databases, visualising scraped data, building a scrapy-based crawler, and finally running a distributed scraper across several servers. You would use this repository if you read Chinese and want a step-by-step path into Python scraping without prior experience, working through each linked article in order. The full README is longer than what was provided.

Copy-paste prompts

Prompt 1
Show me how to use Python to scrape data from a website, starting with fetching a page and parsing its HTML content.
Prompt 2
How do I handle login requirements when scraping a website that requires authentication?
Prompt 3
What techniques can I use to bypass CAPTCHA and anti-scraping measures when collecting data from websites?
Prompt 4
How do I set up a distributed web scraper that can run across multiple servers to collect data at scale?
Prompt 5
Explain how to intercept and inspect network traffic from a mobile app using mitmproxy or Fiddler.
Open on GitHub → Explain another repo

Generated 2026-05-21 · Model: sonnet-4-6 · Verify against the repo before relying on details.