explaingit

kr1s77/python-crawler-tutorial-starts-from-zero

4,591PythonAudience · developerComplexity · 2/5Setup · easy

TLDR

A Chinese-language beginner tutorial series for Python web scraping, covering HTTP requests, data parsing, real-world examples on sites like Douban, and advanced topics like Selenium, Scrapy, and MongoDB.

Mindmap

mindmap
  root((python-crawler-tutorial))
    Basics
      What a scraper is
      HTTP requests
      Python requests lib
    Parsing
      Regular expressions
      JSON parsing
    Real examples
      Douban movie site
      Baidu forums
    Advanced
      Selenium browser control
      Scrapy framework
      MongoDB storage
      OCR from images
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Learn to fetch web pages and extract structured data using Python requests and JSON parsing by following step-by-step lessons.

USE CASE 2

Build a movie data scraper for Douban that collects titles, ratings, and reviews into a structured format.

USE CASE 3

Set up Scrapy for larger crawling projects and store results automatically in a MongoDB database.

Tech stack

PythonrequestsSeleniumScrapyMongoDB

Getting it running

Difficulty · easy Time to first run · 30min

In plain English

This repository is a Chinese-language tutorial series for learning how to write web scrapers in Python, starting from the very basics. A web scraper is a program that automatically visits websites and collects information from them, such as product listings, article titles, or user reviews. The tutorial is structured as a series of numbered lessons. Early chapters cover foundational topics: what a scraper is, how web requests work, how to use Python's requests library to fetch pages, and how to pull specific pieces of data out of a page using tools like regular expressions and JSON parsing. The lessons are written as Markdown documents linked from the README. Alongside the core lessons, the repository includes practical worked examples targeting real Chinese websites, including a movie site (Douban) and Baidu's discussion forums. These examples walk through building an actual scraper step by step. More advanced topics mentioned in the project description include reversing JavaScript code to bypass protections, using Selenium to control a browser programmatically, reading text from images using OCR, storing results in a MongoDB database, and using the Scrapy framework for larger crawling projects. The README is written in Chinese and the external references it links to are a mix of Wikipedia, Chinese tech documentation, and developer tutorials. The project is aimed at Chinese-speaking beginners who want a structured path into web scraping with Python. The README is relatively short and the bulk of the learning content lives in the linked Markdown files rather than in the repository itself.

Copy-paste prompts

Prompt 1
Write me a Python scraper using requests and BeautifulSoup that extracts the title, rating, and number of reviews from each movie on a Douban list page, and saves results as a JSON file.
Prompt 2
How do I use Selenium in Python to scrape a page that requires JavaScript to load its content? Give me a working example that waits for the dynamic content to appear.
Prompt 3
I want to use Scrapy to crawl a website and save results to MongoDB. Show me a complete spider with an item pipeline that connects to MongoDB and inserts each scraped item.
Open on GitHub → Explain another repo

← kr1s77 on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.