explaingit

ericchiang/pup

8,418HTMLAudience · developerComplexity · 1/5Setup · easy

TLDR

pup is a command-line tool for pulling data out of HTML pages using CSS selectors. You pipe HTML into it, write a selector, and get matching elements back as text, attributes, or JSON.

Mindmap

mindmap
  root((pup))
    What it does
      CSS selector filter
      HTML to JSON
      Attribute extraction
    Input and output
      Reads from stdin
      Writes to stdout
      Pipe-friendly
    Selector types
      Tag and class
      ID and attribute
      Sibling position
      Text content
    Setup
      Binary download
      Homebrew on Mac
      Build from source
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Scrape specific data from a webpage by piping curl output through pup with a CSS selector.

USE CASE 2

Extract all links from an HTML page and pass them into another shell tool for further processing.

USE CASE 3

Convert HTML table content to JSON using pup's JSON output mode and then filter it with jq.

USE CASE 4

Pull attribute values or text content from HTML inside a shell script without a full scraping framework.

Tech stack

Go

Getting it running

Difficulty · easy Time to first run · 5min

In plain English

pup is a command line tool for extracting information from HTML pages. It reads HTML from standard input, applies filters you specify, and prints the results to standard output. The tool was inspired by jq, a popular utility for working with JSON data in the terminal, and follows the same pattern of piping data through filters. The filters use CSS selectors, which are the same rules that web developers write to style pages. If you know how to write something like "select all links inside a table" in a stylesheet, you can write the same instruction for pup. This means you can grab specific elements by tag name, by CSS class, by HTML ID, by attribute value, or by their position among siblings. pup supports a broad set of these selectors, including more advanced ones like ":contains" for finding elements by text content. Beyond returning matching HTML, pup offers a few output formats. You can extract just the plain text from matched elements, print the value of a specific attribute such as a URL or an ID, or convert the matched HTML into JSON. The JSON output includes the tag name, text content, and all attributes of each matched element, which makes it easy to pass the result into other tools like jq for further processing. Installation is straightforward. You can download a prebuilt binary from the releases page, install it with Homebrew on a Mac, or build it from source if you have Go installed. Once installed, the typical workflow is to pipe the output of a tool like curl into pup, followed by a selector to pull out what you need. This is a focused utility with no server component, no configuration files, and no ongoing setup. You run it, pass it HTML, and get structured output back.

Copy-paste prompts

Prompt 1
Write a one-liner using curl and pup that fetches a webpage and extracts all href values from anchor tags inside a nav element.
Prompt 2
I have HTML output from a command. Show me how to use pup to select elements by CSS class and print only their text content.
Prompt 3
How do I use pup to convert an HTML table to JSON and then pipe it into jq to filter specific rows by a column value?
Prompt 4
Show me pup selector syntax for matching elements by attribute value, by sibling position, and by text content using :contains.
Open on GitHub → Explain another repo

← ericchiang on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.