explaingit

montferret/ferret

5,981GoAudience · developerComplexity · 3/5LicenseSetup · moderate

TLDR

A web scraping tool written in Go that lets you pull structured data from websites, including JavaScript-rendered pages, using a simple declarative query language instead of writing raw scraping code.

Mindmap

mindmap
  root((Ferret))
    What it does
      Web scraping
      Structured data collection
    Page types
      Static HTML
      JavaScript pages
    Usage modes
      CLI tool
      Embedded in Go
    Use cases
      Web testing
      Analytics data
      ML training data
    Audience
      Go developers
      Data engineers
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Scrape data from a JavaScript-rendered web page that simpler HTML parsers cannot read.

USE CASE 2

Embed Ferret queries in a Go application to collect structured data as part of a larger program.

USE CASE 3

Automate website testing by writing queries that interact with page elements and check the results.

USE CASE 4

Gather training data for a machine learning project by collecting structured content from multiple websites.

Tech stack

Go

Getting it running

Difficulty · moderate Time to first run · 30min

Dynamic page scraping requires a Chrome or Chromium browser driver to be installed and running on the system.

Use freely for personal or commercial projects, include the license notice and state any changes you made to the code.

In plain English

Ferret is a tool for pulling data from websites in a structured way. Instead of writing code that manually clicks through a browser or parses raw HTML, you write queries in Ferret's own declarative language, describe the data you want, and Ferret handles the details of loading pages, interacting with them, and returning results. It works with both static pages (plain HTML returned from a server) and dynamic pages (ones that load content via JavaScript, like most modern web apps). This makes it useful for situations where simpler scraping tools fail because the content you want only appears after the page finishes running its scripts. The project is written in Go and can be embedded directly into a Go application, so you can run Ferret queries as part of a larger program rather than using it only as a standalone tool. There is also a command-line interface for running queries without writing any Go code. The runtime is extensible, meaning you can add custom functions if the built-in ones do not cover your needs. Common use cases mentioned include testing web applications, data collection for analytics, and gathering training data for machine learning workflows. The project is licensed under Apache 2.0. A v2 branch with a revised API is in development alongside the stable v1 release.

Copy-paste prompts

Prompt 1
Write a Ferret query that opens a JavaScript-rendered product listing page and returns the names and prices of all products as a JSON array.
Prompt 2
Show me how to embed the Ferret runtime in a Go application so I can run scraping queries programmatically from Go code.
Prompt 3
Using the Ferret CLI, write a query that navigates a paginated web table and collects all rows across multiple pages.
Prompt 4
How do I add a custom function to the Ferret runtime in Go to extend it with logic the built-in functions do not cover?
Open on GitHub → Explain another repo

← montferret on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.