explaingit

devdaim6/seo-crawler-triage-engine

Analysis updated 2026-05-18

0TypeScriptAudience · developerComplexity · 4/5LicenseSetup · hard

TLDR

A self-hosted crawler that fetches URLs, scores them with AI for Google indexability, and stores the results in a searchable vector database for SEO triage.

Mindmap

mindmap
  root((seo-crawler))
    What it does
      Crawl and analyze URLs
      AI E-E-A-T scoring
      Semantic vector search
    Tech Stack
      Next.js TypeScript
      Node.js PostgreSQL Redis
      pgvector embeddings
      Groq Ollama AI
    Use Cases
      Pre-indexing triage
      Content gap analysis
      Agency client workspaces
    Setup
      Docker Compose
      Groq or local Ollama
    Output
      Dashboard + CSV export
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Triage a batch of URLs before submitting them to Google by running an AI quality score on each page's content.

USE CASE 2

Search your entire crawl history by concept rather than keyword to find content gaps in a specific topic area.

USE CASE 3

Manage separate crawl workspaces for multiple SEO clients in one self-hosted dashboard with CSV export.

What is it built with?

TypeScriptNext.jsNode.jsPostgreSQLRedispgvectorGroqOllama

How does it compare?

devdaim6/seo-crawler-triage-engineairirang/airirang-builderaisurfer/mcp_ui_app_example
Stars000
LanguageTypeScriptTypeScriptTypeScript
Setup difficultyhardmoderatemoderate
Complexity4/53/53/5
Audiencedeveloperdeveloperdeveloper

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1h+

Requires Docker, Postgres with pgvector, and Redis. A Groq API key or local Ollama installation is needed for AI analysis. Deploy with an ADMIN_PASSWORD if hosting publicly.

MIT license, use, modify, and distribute freely for any purpose, including commercial projects.

In plain English

This is a self-hosted web crawler and SEO analysis tool aimed at teams and agencies who manage Google indexing for large numbers of pages. The core idea: instead of blindly submitting thousands of URLs to Google and hoping they get indexed, you run them through this pipeline first, and an AI evaluates each page's quality before you waste your indexing budget on thin or low-value content. When you submit a list of URLs, the tool fetches each one, parses the page content, runs a Google PageSpeed technical audit, then sends the extracted content to an AI model (either Groq in the cloud or a local Ollama model running on your own machine). The AI categorizes the content, identifies the search intent behind the page, and scores it against Google's E-E-A-T framework (Experience, Expertise, Authoritativeness, Trustworthiness). That score acts as a triage signal: pages with strong scores are worth submitting to Google, pages with weak scores need improvement first. Beyond the E-E-A-T analysis, the tool converts every crawled page into vector embeddings stored in a Postgres database using the pgvector extension. This turns your entire crawl history into a semantic search engine, so a content team can search for concepts like "articles about sustainable finance" rather than exact keywords, helping them spot topical gaps in their content library. The dashboard shows real-time crawl progress using live streaming, so you can watch each URL move through fetching, DOM audit, AI scan, and storage phases. Client workspaces let agencies keep each client's data separate, and results can be exported to CSV. Deployment is via Docker Compose, which sets up the Next.js frontend, Node.js backend, Postgres, and Redis in one command. A Groq API key is optional but recommended for speed, Ollama supports fully local and private processing.

Copy-paste prompts

Prompt 1
I set up seo-crawler-triage-engine with Docker. How do I submit a list of 500 URLs from a CSV file for bulk crawling and E-E-A-T analysis?
Prompt 2
How do I switch seo-crawler-triage-engine from using Groq to a local Ollama model for fully private analysis?
Prompt 3
I want to use the semantic search feature in seo-crawler-triage-engine to find all crawled pages related to a specific topic. How do I run a vector search query?
Prompt 4
How do I add a new client workspace in seo-crawler-triage-engine and export its crawl results to CSV for a stakeholder report?

Frequently asked questions

What is seo-crawler-triage-engine?

A self-hosted crawler that fetches URLs, scores them with AI for Google indexability, and stores the results in a searchable vector database for SEO triage.

What language is seo-crawler-triage-engine written in?

Mainly TypeScript. The stack also includes TypeScript, Next.js, Node.js.

What license does seo-crawler-triage-engine use?

MIT license, use, modify, and distribute freely for any purpose, including commercial projects.

How hard is seo-crawler-triage-engine to set up?

Setup difficulty is rated hard, with roughly 1h+ to a first successful run.

Who is seo-crawler-triage-engine for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub devdaim6 on gitmyhub

Verify against the repo before relying on details.