binux/pyspider

Analysis updated 2026-06-24

★ 16,810PythonAudience · dataComplexity · 3/5Setup · moderate

Mindmap

mindmap
  root((pyspider))
    Inputs
      Seed URLs
      Python handler scripts
      Crawl schedule
    Outputs
      Extracted records
      Database rows
      Web dashboard view
    Use Cases
      Scrape sites on a schedule
      Crawl JavaScript pages
      Distribute crawls across machines
    Tech Stack
      Python
      MySQL or MongoDB
      Message queue
      Web UI

mindmap root((pyspider)) Inputs Seed URLs Python handler scripts Crawl schedule Outputs Extracted records Database rows Web dashboard view Use Cases Scrape sites on a schedule Crawl JavaScript pages Distribute crawls across machines Tech Stack Python MySQL or MongoDB Message queue Web UI

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Scrape product prices from a list of e-commerce sites on a daily schedule

USE CASE 2

Crawl news sites and store articles in MongoDB for later analysis

USE CASE 3

Run distributed crawl workers across several machines coordinated by a message queue

What is it built with?

PythonMySQLMongoDBPostgreSQLSQLite

How does it compare?

	binux/pyspider	lllyasviel/framepack	exaloop/codon
Stars	16,810	16,810	16,769
Language	Python	Python	Python
Setup difficulty	moderate	hard	moderate
Complexity	3/5	4/5	4/5
Audience	data	general	researcher

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

Project is no longer actively maintained and may need older Python or pinned dependencies to install cleanly.

In plain English

pyspider is a web crawling framework written in Python. A web crawler, also called a spider, is a program that automatically visits websites, reads their content, and extracts data from them. pyspider makes it easier to build these programs by handling the scheduling, retrying, and storage of crawl jobs while letting you focus on writing the logic for what to collect. The framework comes with a web-based interface where you can write and edit your crawl scripts, monitor running tasks, manage projects, and view results, all from a browser. This is unlike most crawlers that are purely command-line tools. The sample code in the README shows the basic pattern: you define a handler class with methods for different types of pages. One method handles the starting page, finds links, and queues them for crawling. Another method extracts the specific data you want from each page, in the example, the URL and title. pyspider supports multiple database backends for storing results (MySQL, MongoDB, PostgreSQL, SQLite, and others) and multiple message queue systems for coordinating work across distributed machines. It supports crawling JavaScript-heavy pages and can be configured with task priorities, automatic retries on failure, and scheduled re-crawls on a time interval. You would use pyspider if you need to regularly scrape data from websites at scale, for example, monitoring prices, aggregating content, or building a dataset. It is installed via pip and starts with a single command.

Copy-paste prompts

Prompt 1

Install pyspider locally and write a script that crawls Hacker News and stores titles in SQLite

Prompt 2

Configure pyspider to use MongoDB as the result store and Redis as the message queue

Prompt 3

Write a pyspider handler that follows pagination links and extracts product price and name

Prompt 4

Show me how to schedule a pyspider project to re-crawl every 24 hours

Frequently asked questions

What is pyspider?

A Python web crawler framework with a browser-based dashboard for writing scripts, monitoring jobs, and storing results in your choice of database.

What language is pyspider written in?

Mainly Python. The stack also includes Python, MySQL, MongoDB.

How hard is pyspider to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is pyspider for?

Mainly data.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub binux on gitmyhub

Verify against the repo before relying on details.