explaingit

postlight/parser

5,784JavaScriptAudience · developerComplexity · 2/5LicenseSetup · easy

TLDR

A JavaScript library that takes a URL and returns just the article title, author, date, and clean body text, stripping ads, nav menus, and everything else a reader does not need.

Mindmap

mindmap
  root((Postlight Parser))
    What it does
      Extract article content
      Strip ads and nav
      Return clean data
    Output Fields
      Title and author
      Publish date
      Body text excerpt
    Output Formats
      HTML default
      Markdown
      Plain text
    Customization
      Custom extractors
      CSS selector rules
      Pass custom headers
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Build a read-later app that stores clean article text instead of full web pages.

USE CASE 2

Feed extracted article content into an AI summarizer or topic classifier without noise from ads and navigation.

USE CASE 3

Create a distraction-free reading view in a browser extension by stripping page clutter on the fly.

USE CASE 4

Scrape article metadata like author and publish date from a list of URLs for a content aggregation pipeline.

Tech stack

JavaScriptNode.js

Getting it running

Difficulty · easy Time to first run · 5min

Install via npm and call one function with a URL, no configuration required.

Use freely for any purpose including commercial projects under the Apache 2.0 or MIT license, keeping the copyright notice.

In plain English

Postlight Parser is a JavaScript library that takes a URL and returns the meaningful content from that page as clean, structured data. Rather than getting back an entire web page filled with navigation menus, ads, and unrelated links, you receive only the parts a reader cares about: the article text, title, author name, publication date, a short excerpt, and the lead image URL. Fields the parser cannot find are returned as null. The main use is stripping noise from articles so the content can be displayed in a cleaner reading view, stored in a database, or processed further by another tool. The library powers a browser extension called Postlight Reader, which applies this extraction in real time to give a distraction-free reading mode on any site. You can request the extracted content in three formats: HTML (the default), Markdown, or plain text. Custom request headers can be passed along for pages that require cookies or a specific browser identity string. The parser can also work on HTML you have already fetched yourself, rather than fetching the URL on its own. Sites often have unusual markup that causes generic parsing to fail. Postlight Parser addresses this by allowing custom extractors written with JavaScript and CSS selectors for specific domains. Many pre-built extractors for popular sites are included in the project, and contributors can add new ones by following a documented process. A command-line tool is included alongside the library, so you can parse a URL from a terminal without writing any code. The library is dual-licensed under Apache 2.0 and MIT.

Copy-paste prompts

Prompt 1
Using @postlight/parser, write a Node.js script that takes a list of article URLs from a file and saves each one as clean markdown text.
Prompt 2
Show me how to write a custom extractor for postlight/parser that handles a site with unusual markup using CSS selectors.
Prompt 3
How do I use Postlight Parser to extract article content from HTML I have already fetched myself, without having the library make a network request?
Prompt 4
Build a simple read-later API endpoint that accepts a URL, runs it through Postlight Parser, and returns the title and body as JSON.
Open on GitHub → Explain another repo

← postlight on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.