Run multiple Scrapy crawlers in parallel across different machines, all drawing from the same shared Redis URL queue.
Track which URLs have already been crawled using Redis so your distributed scrapers never visit the same page twice.
Push scraped items into a Redis queue so separate post-processing scripts can consume and handle them asynchronously.
Pass structured JSON data with URL, metadata, and form data through the Redis queue to crawlers that need rich context per request.
Requires a running Redis 5.0+ server and an existing Scrapy project, configure via scrapy settings after pip install.
Scrapy-Redis adds Redis-based components to Scrapy, a Python library used for crawling websites and extracting data from them. Redis is an in-memory data store commonly used to share information quickly between multiple running processes. By connecting the two, Scrapy-Redis lets you run several crawlers at the same time, all drawing from the same shared queue of URLs to visit, which is useful when you need to collect data from many websites faster than a single process can manage. The library provides three main pieces: a scheduler that stores the crawl queue in Redis instead of in memory, a duplication filter that records which URLs have already been visited so they are not crawled twice, and an item pipeline that pushes scraped results into a Redis queue so separate post-processing scripts can pick them up. These components are described as plug-and-play, meaning you configure them in your Scrapy settings without rewriting your spiders. This particular fork also supports passing structured JSON data through the Redis queue. Each entry can include a URL, metadata, and optional form data, which the spider then reads when making requests. This extends the basic distributed crawling use case to workflows that need to pass richer context alongside each URL. The library requires Python 3.7 or newer, Redis 5.0 or newer, and Scrapy 2.0 or newer. It is installed via pip and released under the MIT license.
← rmax on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.