explaingit

apify/crawlee

📈 Trending23,318TypeScriptAudience · developerComplexity · 3/5ActiveLicenseSetup · easy

TLDR

Web scraping and browser automation library for Node.js that handles JavaScript-heavy sites, proxy rotation, and URL queue management automatically.

Mindmap

mindmap
  root((Crawlee))
    What it does
      Scrapes websites
      Controls browsers
      Manages URL queues
      Rotates proxies
    How it works
      Playwright or Puppeteer
      HTTP requests
      Browser fingerprints
      Error handling
    Use cases
      Price comparison
      News aggregation
      Training data
      Competitor monitoring
    Tech stack
      Node.js runtime
      TypeScript
      Playwright
      Puppeteer
    Output options
      Local disk storage
      Cloud storage
      Structured data

Things people build with this

USE CASE 1

Build a price comparison tool that automatically collects product prices from multiple e-commerce sites.

USE CASE 2

Aggregate news articles and headlines from multiple news websites into a single feed.

USE CASE 3

Collect training data for machine learning models by scraping images, text, and metadata from websites.

USE CASE 4

Monitor competitor websites for price changes, new product launches, or inventory updates.

Tech stack

TypeScriptNode.jsPlaywrightPuppeteer

Getting it running

Difficulty · easy Time to first run · 5min
Use freely for any purpose, including commercial use, as long as you keep the copyright notice and license text.

In plain English

Crawlee is a web scraping and browser automation library for Node.js. Web scraping means automatically visiting websites and extracting information from them, like prices, product listings, article text, or any other data you can see in a browser. Crawlee makes this easier by handling the repetitive, technical work for you. The problem it solves is that scraping modern websites is hard: pages load content using JavaScript, websites detect and block automated requests, and managing a queue of thousands of URLs while handling errors and retries gets complex fast. Crawlee handles all of this. It can control real browsers (via Playwright or Puppeteer) to scrape JavaScript-heavy sites, or use fast HTTP requests for simpler pages. It automatically rotates proxies to avoid blocks, generates realistic browser fingerprints to appear human-like, manages a queue of URLs to visit, and saves collected data to disk or cloud storage. You would use this if you need to extract data from websites at scale, for example, to build a price comparison tool, aggregate news articles, collect training data for AI, or monitor competitor websites. It works in JavaScript and TypeScript and runs on Node.js. It is developed by Apify, a company that provides cloud infrastructure for running scrapers, though Crawlee itself runs anywhere.

Copy-paste prompts

Prompt 1
Show me how to set up Crawlee to scrape a list of product prices from an e-commerce site and save them to a CSV file.
Prompt 2
How do I use Crawlee to handle JavaScript-rendered content on a website that loads data dynamically?
Prompt 3
Write a Crawlee script that visits multiple URLs, extracts article titles and links, and stores them in a database.
Prompt 4
How do I configure proxy rotation in Crawlee to avoid getting blocked while scraping a website?
Prompt 5
Show me how to set up error handling and retries in Crawlee for URLs that fail to load.
Open on GitHub → Explain another repo

Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.