Build a chatbot trained on your product's documentation so customers can get instant answers.
Create an AI assistant for your company's help center or internal knowledge base.
Turn any public website into a custom GPT you can share with others without manual data entry.
Requires OpenAI API key and Docker to run the crawler; web scraping setup may need URL configuration.
GPT Crawler is a tool that automatically visits and reads the pages of a website, then saves their text content into a single JSON file that you can upload to OpenAI to create a custom AI assistant trained on that site's content. In other words, it lets you turn any documentation site or web resource into the knowledge base for your own chatbot, without any manual copy-pasting. The way it works: you give it a starting URL and a pattern for which links to follow (for example, "start at the developer docs homepage and follow any link that matches /docs/**"). You can also specify a CSS selector, a way of identifying which part of the page contains the useful text, so it skips navigation menus, footers, and other noise. The crawler visits each matching page, extracts the relevant text, and saves everything to a file called output.json. Configuration options let you cap how many pages it visits, limit the output file size, and exclude certain file types from being fetched. Once you have the output file, you upload it to OpenAI's platform to power either a "custom GPT" (a shareable chatbot you can build through OpenAI's web interface) or a "custom assistant" (an AI you can integrate into your own product via the API). The README includes a step-by-step walkthrough for both paths. You would use this when you want an AI assistant that knows the content of a specific website, a product's documentation, a company's help center, or your own site, and can answer questions about it. It is written in TypeScript and runs on Node.js (version 16 or higher). It can also be run inside a Docker container or started as an API server.
Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.