Scrape data from a JavaScript-rendered web page that simpler HTML parsers cannot read.
Embed Ferret queries in a Go application to collect structured data as part of a larger program.
Automate website testing by writing queries that interact with page elements and check the results.
Gather training data for a machine learning project by collecting structured content from multiple websites.
Dynamic page scraping requires a Chrome or Chromium browser driver to be installed and running on the system.
Ferret is a tool for pulling data from websites in a structured way. Instead of writing code that manually clicks through a browser or parses raw HTML, you write queries in Ferret's own declarative language, describe the data you want, and Ferret handles the details of loading pages, interacting with them, and returning results. It works with both static pages (plain HTML returned from a server) and dynamic pages (ones that load content via JavaScript, like most modern web apps). This makes it useful for situations where simpler scraping tools fail because the content you want only appears after the page finishes running its scripts. The project is written in Go and can be embedded directly into a Go application, so you can run Ferret queries as part of a larger program rather than using it only as a standalone tool. There is also a command-line interface for running queries without writing any Go code. The runtime is extensible, meaning you can add custom functions if the built-in ones do not cover your needs. Common use cases mentioned include testing web applications, data collection for analytics, and gathering training data for machine learning workflows. The project is licensed under Apache 2.0. A v2 branch with a revised API is in development alongside the stable v1 release.
← montferret on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.