Build a read-later app that stores clean article text instead of full web pages.
Feed extracted article content into an AI summarizer or topic classifier without noise from ads and navigation.
Create a distraction-free reading view in a browser extension by stripping page clutter on the fly.
Scrape article metadata like author and publish date from a list of URLs for a content aggregation pipeline.
Install via npm and call one function with a URL, no configuration required.
Postlight Parser is a JavaScript library that takes a URL and returns the meaningful content from that page as clean, structured data. Rather than getting back an entire web page filled with navigation menus, ads, and unrelated links, you receive only the parts a reader cares about: the article text, title, author name, publication date, a short excerpt, and the lead image URL. Fields the parser cannot find are returned as null. The main use is stripping noise from articles so the content can be displayed in a cleaner reading view, stored in a database, or processed further by another tool. The library powers a browser extension called Postlight Reader, which applies this extraction in real time to give a distraction-free reading mode on any site. You can request the extracted content in three formats: HTML (the default), Markdown, or plain text. Custom request headers can be passed along for pages that require cookies or a specific browser identity string. The parser can also work on HTML you have already fetched yourself, rather than fetching the URL on its own. Sites often have unusual markup that causes generic parsing to fail. Postlight Parser addresses this by allowing custom extractors written with JavaScript and CSS selectors for specific domains. Many pre-built extractors for popular sites are included in the project, and contributors can add new ones by following a documented process. A command-line tool is included alongside the library, so you can parse a URL from a terminal without writing any code. The library is dual-licensed under Apache 2.0 and MIT.
← postlight on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.