explaingit

mozilla/readability

11,182JavaScriptAudience · developerComplexity · 2/5Setup · easy

TLDR

Mozilla's JavaScript library that strips a webpage down to just its article text and images, the same code that powers Firefox's Reader View, available as a standalone package for your own projects.

Mindmap

mindmap
  root((readability))
    What it does
      Strips page clutter
      Returns article object
      Powers Firefox Reader
    Output Fields
      Title
      Clean HTML
      Plain text
      Author and date
    Environments
      Browser
      Node.js
    Key Functions
      Readability parse
      isProbablyReaderable
    Audience
      Web developers
      App builders
      AI pipeline devs
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Build a read-it-later app that saves clean article text from any URL, free of ads and navigation clutter.

USE CASE 2

Feed clean article text into an AI summarization or translation pipeline without dealing with raw HTML noise.

USE CASE 3

Add a reader-mode toggle to a browser extension that shows only the article body when users request it.

USE CASE 4

Check whether a page looks like an article before processing it, using the isProbablyReaderable fast-check function.

Tech stack

JavaScriptNode.jsjsdom

Getting it running

Difficulty · easy Time to first run · 5min

Always sanitize Readability's HTML output before displaying it to users, the library does not block malicious HTML on its own.

No license information stated in the explanation.

In plain English

Readability.js is the JavaScript library that powers Firefox's Reader View, the feature that strips a cluttered webpage down to just its article text and images. Mozilla has published it as a standalone package so developers can use it in their own projects without relying on Firefox itself. The core idea is simple: you give it a web page's document, and it returns a clean article object. That object contains the article title, the cleaned-up HTML content, the plain text version (with all HTML tags removed), the author, the publication date, the language, and a short excerpt. One function call does most of the work. The library runs in web browsers and also in server-side JavaScript environments like Node.js. In a browser you typically already have a document object to pass in. In Node.js you need a helper library to create one from raw HTML, and the README shows how to do that with a commonly used tool called jsdom. There are a handful of optional settings you can adjust: how long an article must be before Readability bothers returning a result, whether to keep or strip CSS class names from the output, which video URLs to allow, and how to convert the final content to a string. A companion function called isProbablyReaderable gives a fast yes-or-no check on whether a page looks like an article at all, which is useful if you want to avoid running the full parsing logic on pages that are clearly not articles. One important note from the README: the parsing step modifies the original document by removing elements. If you need the original page intact after parsing, clone the document first. The README also strongly recommends running the output through a sanitizer library before displaying it to users, since the library itself does not attempt to block malicious HTML.

Copy-paste prompts

Prompt 1
Using Mozilla's Readability library in Node.js with jsdom, write a script that takes a URL as input, fetches the page, and returns the clean article title and text.
Prompt 2
I'm building a read-it-later app, show me how to use Readability.js to extract the article content from a fetched HTML string and display just the title, author, and body.
Prompt 3
How do I use isProbablyReaderable to skip non-article pages before running the full Readability parse? Give me the code.
Prompt 4
I need to process many web pages safely, show me how to clone the document before passing it to Readability so the original page is not modified, and how to sanitize the output.
Open on GitHub → Explain another repo

← mozilla on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.