Analysis updated 2026-05-18
Learn how to automatically collect data from websites and apps using Python from scratch.
Build a web scraper that handles logins, CAPTCHAs, and anti-scraping protections.
Set up a distributed scraping system that runs across multiple servers to gather large amounts of data.
Understand how to inspect network traffic and parse web page content programmatically.
| wistbean/learn_python3_spider | xiaomi/ha_xiaomi_home | recommenders-team/recommenders | |
|---|---|---|---|
| Stars | 21,629 | 21,654 | 21,669 |
| Language | Python | Python | Python |
| Setup difficulty | easy | moderate | moderate |
| Complexity | 2/5 | 2/5 | 3/5 |
| Audience | developer | vibe coder | researcher |
Figures from each repo's GitHub metadata at analysis time.
learn_python3_spider is a Chinese-language tutorial series for learning Python web scraping from scratch. The description frames it as a "from zero to one" guide aimed at people new to scraping who want a structured path through the topic. Instead of being a single library, the repository is essentially a curated reading list and accompanying example collection, linking out to a long sequence of articles that build skills step by step. According to the description, the series covers the full landscape of practical scraping work. It walks through capturing browser and mobile-app traffic with tools like Fiddler and mitmproxy, then introduces the common Python modules used in scrapers, including requests, BeautifulSoup, Selenium, Appium, and Scrapy. It also touches on supporting skills a real scraper needs in the wild: rotating IP proxies to avoid being blocked, recognising CAPTCHAs, storing scraped data in MySQL and MongoDB databases, running scrapes in multiple threads or processes for speed, reversing CSS-based and JavaScript-based anti-scraping protections, building distributed scrapers across machines, and several end-to-end project examples. Someone would use this repo as a self-study curriculum rather than as a code library you install. It fits a beginner who can read Chinese and wants a single roadmap from "what is a scraper" through to advanced reverse-engineering, instead of piecing tutorials together themselves. The repository's primary language is listed as Python.
A Chinese-language tutorial series teaching web scraping with Python 3, from basics to advanced techniques like handling logins, CAPTCHAs, and distributed scrapers.
Mainly Python. The stack also includes Python 3, Fiddler, mitmproxy.
Use freely for any purpose including commercial, as long as you keep the copyright notice.
Setup difficulty is rated easy, with roughly 5min to a first successful run.
Mainly developer.
This repo across BitVibe Labs
Verify against the repo before relying on details.