Analysis updated 2026-07-03
Pull product prices or article headlines from static HTML pages using CSS selectors.
Scrape content from pages that load data via JavaScript after the initial page load.
Route scraping jobs across many different URL patterns each mapped to its own handler.
Build a lightweight data pipeline that extracts, transforms, and collects web data with promise chains.
| ruipgil/scraperjs | airbnb/polyglot.js | britecharts/britecharts | |
|---|---|---|---|
| Stars | 3,720 | 3,721 | 3,722 |
| Language | JavaScript | JavaScript | JavaScript |
| Setup difficulty | moderate | easy | easy |
| Complexity | 2/5 | 2/5 | 2/5 |
| Audience | developer | developer | developer |
Figures from each repo's GitHub metadata at analysis time.
DynamicScraper requires PhantomJS installed separately, StaticScraper has no extra dependencies.
Scraperjs is a Node.js library for extracting data from web pages. You point it at a URL, write a short function that picks out the content you want using jQuery-style selectors, and it returns the results. The whole thing is promise-based, so you chain steps together in a readable sequence. The library offers two scraper types. The StaticScraper is lightweight and fast: it downloads the raw HTML and lets you query it with cheerio, a jQuery-compatible library that runs server-side. It works well for pages whose content is already present in the HTML source. The DynamicScraper uses PhantomJS, a headless browser, which means it can run JavaScript on the page and see content that only appears after scripts execute, similar to what you would see in a real browser. The DynamicScraper is heavier and requires PhantomJS installed separately. Both scrapers share nearly the same API, so switching between them is straightforward. The scraping function you write is almost identical for both, with one key difference: in the DynamicScraper, the function runs inside the page's sandboxed environment, which means it cannot reference variables from the surrounding Node.js code. For scraping multiple sites or many different URL patterns, scraperjs includes a Router class. You define URL patterns with named parameters (similar to route patterns in web frameworks), attach a scraper to each pattern, and then feed URLs to the router. It matches each URL to the right scraper and handler automatically. The API also exposes controls for status code handling, delays, timeouts, async steps, and error catching within the promise chain. The library is installed via npm with a single command. PhantomJS is optional and only required for the DynamicScraper.
A Node.js library for scraping web pages using jQuery-style selectors, with options for both static HTML and JavaScript-rendered pages.
Mainly JavaScript. The stack also includes JavaScript, Node.js, cheerio.
No license information was mentioned in the explanation.
Setup difficulty is rated moderate, with roughly 30min to a first successful run.
Mainly developer.
This repo across BitVibe Labs
Verify against the repo before relying on details.