explaingit

ssssssss-team/spider-flow

11,325JavaAudience · developerComplexity · 3/5LicenseSetup · moderate

TLDR

A no-code web scraping platform where you build scrapers by drawing a flowchart. Handles dynamic pages, proxy rotation, database storage, scheduling, and OCR without writing code.

Mindmap

mindmap
  root((spider-flow))
    What it does
      Visual scraper builder
      Data extraction
      Task scheduling
    Tech stack
      Java
      Selenium plugin
      Redis plugin
    Use cases
      E-commerce monitoring
      Database population
      API-triggered jobs
    Audience
      Data teams
      Developers
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Build a price-monitoring scraper that extracts product data from e-commerce pages and saves results directly to a database.

USE CASE 2

Scrape JavaScript-rendered pages by connecting a browser-automation block in the visual flowchart editor.

USE CASE 3

Trigger scraping jobs from another system by calling spider-flow's built-in HTTP API.

USE CASE 4

Set up a recurring scraper with automatic proxy rotation to collect data from multiple sources on a schedule.

Tech stack

JavaSpring BootSeleniumRedisMongoDBMySQL

Getting it running

Difficulty · moderate Time to first run · 1h+

Requires Java 1.8+, a relational database, and optional plugin setup for Selenium, Redis, or MongoDB features.

Use freely for any purpose, including commercial use, as long as you keep the copyright notice.

In plain English

spider-flow is a visual web scraping platform that lets you build scrapers by drawing a flowchart rather than writing code. You connect blocks in a diagram to define what the scraper should fetch, what data to extract, and where to store the results. The README is written in Chinese, but the features are documented in a structured list. The platform can extract data from web pages using several methods: XPath (a way of selecting elements by their position in an HTML structure), CSS selectors (targeting elements by their styling class or ID), JsonPath (for JSON data), and regular expressions. It handles pages that load their content dynamically through JavaScript or AJAX requests, not just static HTML. Proxy support is included, and cookies are managed automatically. Scraped data can be saved directly to a database using standard SQL operations (select, insert, update, delete), or written to files. Multiple database connections can be configured. A task monitoring panel and log viewer let you track what scrapers are running and what happened during each run. The platform also exposes an HTTP API so other systems can trigger scraper jobs programmatically. A plugin system extends the core platform. Available plugins include Selenium (for browser automation), Redis (for caching or queuing), MongoDB, cloud object storage, an IP proxy pool, an OCR plugin for reading text from images, and an email plugin. Custom functions and custom executor plugins can also be written. The project includes a disclaimer stating it should not be used for illegal purposes or in ways that violate websites' terms of service. It requires Java 1.8 or higher and is licensed under MIT.

Copy-paste prompts

Prompt 1
I have spider-flow running. Walk me through building a flowchart scraper that extracts product names and prices from an e-commerce site using CSS selectors, then inserts the rows into a MySQL table.
Prompt 2
How do I configure the Selenium plugin in spider-flow to handle a page that loads its content via JavaScript after a 2-second delay?
Prompt 3
Write an HTTP API request to trigger a spider-flow scraping task programmatically and check the job status afterward.
Prompt 4
How do I enable the OCR plugin in spider-flow to extract text from CAPTCHA or image-based content on a target page?
Open on GitHub → Explain another repo

← ssssssss-team on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.