spiderclub/weibospider

Analysis updated 2026-06-26

★ 4,790PythonAudience · researcherComplexity · 3/5Setup · moderate

Mindmap

mindmap
  root((Weibospider))
    Data Collection
      User Profiles
      Posts and Comments
      Repost Relationships
      Keyword Search
    Core Libraries
      Celery Workers
      Requests HTTP
    Storage Layer
      MySQL Database
      Redis Coordinator
    Management
      Django Web UI
      Celery Beat Scheduler
    Deployment
      Multi Machine Scale
      YAML Configuration

mindmap root((Weibospider)) Data Collection User Profiles Posts and Comments Repost Relationships Keyword Search Core Libraries Celery Workers Requests HTTP Storage Layer MySQL Database Redis Coordinator Management Django Web UI Celery Beat Scheduler Deployment Multi Machine Scale YAML Configuration

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Collect Weibo user profiles and post histories for academic or market research

USE CASE 2

Gather comments and repost graphs for social network analysis or NLP datasets

USE CASE 3

Monitor keyword topics on Weibo by scheduling periodic crawls across multiple workers

USE CASE 4

Build a Weibo dataset for Chinese-language natural language processing projects

What is it built with?

PythonCeleryRequestsMySQLRedisDjangoYAML

How does it compare?

	spiderclub/weibospider	collegeschat/university-information	listen1/listen1
Stars	4,790	4,789	4,789
Language	Python	Python	Python
Setup difficulty	moderate	easy	easy
Complexity	3/5	1/5	2/5
Audience	researcher	general	general

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 1h+

Requires MySQL and Redis instances, a valid Weibo account, and a configured YAML file before starting workers. Cookie refresh via Celery beat is mandatory every 24 hours.

License not mentioned in the explanation.

In plain English

Weibospider is a distributed data-collection tool for Weibo, the large Chinese social media platform. It gathers public information including user profiles, original posts from a specific account's homepage, comments on posts, repost relationships, and posts matching a given keyword search. The README is written in Chinese, and the project targets researchers and developers working with Weibo data for analysis or natural language processing. The system is built on top of two popular Python libraries: Celery, which handles task scheduling and distribution across multiple machines, and Requests, which handles the underlying HTTP communication. Data is stored in a MySQL database, and Redis is used to coordinate the Celery workers. The project explicitly avoids browser automation for login, relying instead on manually analyzed network requests, which the authors say makes the scraper more stable over long runs. Setting the system up requires configuring a YAML file with your MySQL and Redis connection details, Weibo account credentials, and notification email settings. You then create the database tables, optionally start a small Django-based web interface for managing crawl targets, and launch one or more Celery workers. A separate Celery beat process handles periodic tasks such as refreshing login cookies, which Weibo invalidates every 24 hours. Because it runs as separate workers, you can spread the load across multiple machines simply by installing the dependencies on each machine and pointing them at the same Redis and MySQL instances. The project includes rate-limiting controls in its configuration file, and the authors ask users to keep crawl frequency reasonable to avoid disrupting the Weibo platform.

Copy-paste prompts

Prompt 1

I have the Weibospider repo cloned and workers running. Write a Python script that queries my MySQL database to count the total posts collected per user and export the top 20 most-collected users to a CSV file.

Prompt 2

Using the Weibospider codebase, show me how to add a new Celery task that crawls follower lists for a given Weibo user ID and stores each follower's UID and username in a new MySQL table.

Prompt 3

I want to analyze sentiment on Weibo posts collected by Weibospider. Write a Python script that reads the posts table from MySQL and runs a simple Chinese-language sentiment classifier using snownlp on each post body.

Prompt 4

Help me configure Weibospider's YAML file to crawl keyword AI every 6 hours across 3 worker machines sharing the same Redis and MySQL instances, with a rate limit of 10 requests per minute.

Prompt 5

Explain the Celery beat schedule in Weibospider and show me how to change the cookie-refresh interval from 24 hours to 12 hours in the configuration.

Frequently asked questions

What is weibospider?

Weibospider is a distributed Python scraper for Weibo that collects user profiles, posts, comments, reposts, and keyword search results, storing data in MySQL with Celery workers coordinated via Redis.

What language is weibospider written in?

Mainly Python. The stack also includes Python, Celery, Requests.

What license does weibospider use?

License not mentioned in the explanation.

How hard is weibospider to set up?

Setup difficulty is rated moderate, with roughly 1h+ to a first successful run.

Who is weibospider for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub spiderclub on gitmyhub

Verify against the repo before relying on details.