Analysis updated 2026-05-18
Triage a batch of URLs before submitting them to Google by running an AI quality score on each page's content.
Search your entire crawl history by concept rather than keyword to find content gaps in a specific topic area.
Manage separate crawl workspaces for multiple SEO clients in one self-hosted dashboard with CSV export.
| devdaim6/seo-crawler-triage-engine | airirang/airirang-builder | aisurfer/mcp_ui_app_example | |
|---|---|---|---|
| Stars | 0 | 0 | 0 |
| Language | TypeScript | TypeScript | TypeScript |
| Setup difficulty | hard | moderate | moderate |
| Complexity | 4/5 | 3/5 | 3/5 |
| Audience | developer | developer | developer |
Figures from each repo's GitHub metadata at analysis time.
Requires Docker, Postgres with pgvector, and Redis. A Groq API key or local Ollama installation is needed for AI analysis. Deploy with an ADMIN_PASSWORD if hosting publicly.
This is a self-hosted web crawler and SEO analysis tool aimed at teams and agencies who manage Google indexing for large numbers of pages. The core idea: instead of blindly submitting thousands of URLs to Google and hoping they get indexed, you run them through this pipeline first, and an AI evaluates each page's quality before you waste your indexing budget on thin or low-value content. When you submit a list of URLs, the tool fetches each one, parses the page content, runs a Google PageSpeed technical audit, then sends the extracted content to an AI model (either Groq in the cloud or a local Ollama model running on your own machine). The AI categorizes the content, identifies the search intent behind the page, and scores it against Google's E-E-A-T framework (Experience, Expertise, Authoritativeness, Trustworthiness). That score acts as a triage signal: pages with strong scores are worth submitting to Google, pages with weak scores need improvement first. Beyond the E-E-A-T analysis, the tool converts every crawled page into vector embeddings stored in a Postgres database using the pgvector extension. This turns your entire crawl history into a semantic search engine, so a content team can search for concepts like "articles about sustainable finance" rather than exact keywords, helping them spot topical gaps in their content library. The dashboard shows real-time crawl progress using live streaming, so you can watch each URL move through fetching, DOM audit, AI scan, and storage phases. Client workspaces let agencies keep each client's data separate, and results can be exported to CSV. Deployment is via Docker Compose, which sets up the Next.js frontend, Node.js backend, Postgres, and Redis in one command. A Groq API key is optional but recommended for speed, Ollama supports fully local and private processing.
A self-hosted crawler that fetches URLs, scores them with AI for Google indexability, and stores the results in a searchable vector database for SEO triage.
Mainly TypeScript. The stack also includes TypeScript, Next.js, Node.js.
MIT license, use, modify, and distribute freely for any purpose, including commercial projects.
Setup difficulty is rated hard, with roughly 1h+ to a first successful run.
Mainly developer.
This repo across BitVibe Labs
Verify against the repo before relying on details.