spiderclub/haipproxy

★ 5,537PythonAudience · developerComplexity · 3/5Setup · moderate

Mindmap

mindmap
  root((Haipproxy))
    What it does
      Collect proxy IPs
      Validate proxies
      Rotate proxies
    Tech Stack
      Python
      Scrapy
      Redis
      Docker
    Use Cases
      Web scraping
      Anti-blocking
      Proxy pool API
    Features
      Squid integration
      Prometheus metrics
      Docker Compose setup

mindmap root((Haipproxy)) What it does Collect proxy IPs Validate proxies Rotate proxies Tech Stack Python Scrapy Redis Docker Use Cases Web scraping Anti-blocking Proxy pool API Features Squid integration Prometheus metrics Docker Compose setup

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Build a web scraper that automatically rotates through working proxy IPs to avoid being blocked by target websites.

USE CASE 2

Run a self-hosted proxy pool that continuously crawls public proxy lists and checks which ones are still alive.

USE CASE 3

Route any HTTP-aware tool through a Squid server that automatically pulls live proxies from the pool in the background.

USE CASE 4

Monitor proxy pool health over time using built-in Prometheus and Grafana metrics.

Tech stack

PythonScrapyRedisDockerSquidPrometheusGrafana

Getting it running

Difficulty · moderate Time to first run · 30min

Requires Redis running separately or via Docker Compose, some proxy sources may be blocked in certain network environments.

In plain English

Haipproxy is a tool for building and running your own pool of working proxy IP addresses. A proxy IP is a middleman address you route internet requests through, which web scrapers commonly use to avoid being blocked by websites. This project collects proxy IPs from public sources on the internet, tests them to make sure they actually work, and keeps them organized in a database so your scraper can pull a fresh, working address whenever it needs one. The system is built around two main frameworks: Scrapy handles the crawling side, meaning it fetches and filters IP addresses from various public proxy listing sites. Redis acts as the shared memory that all the pieces of the system read from and write to, storing both the raw IP lists and the validation results. You run separate processes for crawling (collecting new proxy IPs), validating (checking whether those IPs are still alive), and scheduling (deciding when to re-run those checks on a timer). Clients connect to the pool in two ways. There is a Python library you import directly into your scraper code, calling a simple function to get one working IP or a list of them. There is also a Squid integration, where Squid acts as a local proxy server that automatically pulls addresses from the pool in the background, so any tool that supports HTTP proxies can use haipproxy without code changes. Deployment can be done on a single machine by installing Python, Redis, and the project dependencies, then starting the crawler, validator, and scheduler processes individually. There is also a Docker Compose configuration that starts all components together in containers, including Squid. For monitoring, the project supports Sentry for tracking errors and unexpected crashes, and Prometheus combined with Grafana for watching metrics about how many proxies are available and how healthy the system is over time. The README notes that some proxy sources may be blocked in certain network environments, and there is a configuration flag to disable crawling those particular sources if needed.

Copy-paste prompts

Prompt 1

How do I deploy haipproxy with Docker Compose so the crawler, validator, and scheduler all start automatically together?

Prompt 2

Show me how to call the haipproxy Python client inside a Scrapy spider to get a working proxy IP for each request.

Prompt 3

How do I configure haipproxy to skip proxy sources that are blocked in my network environment?

Prompt 4

How do I set up the Prometheus and Grafana monitoring integration for haipproxy to track how many live proxies are available?

Prompt 5

What is the difference between using the haipproxy Python client and the Squid integration, and when should I choose each?

Open on GitHub → Explain another repo

← spiderclub on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.