explaingit

jina-ai/reader

10,804TypeScriptAudience · developerComplexity · 2/5Setup · easy

TLDR

An open-source tool that converts any web page, PDF, or Office document into clean markdown by prepending a URL prefix, so AI models can read and reason about real content without noise or login required.

Mindmap

mindmap
  root((Jina Reader))
    What it does
      Web page to markdown
      PDF and doc parsing
      AI-ready clean text
    How to use
      Prepend URL prefix
      No account needed
      Request headers options
    Content types
      Web pages
      PDFs and Office files
      Images with captions
    Self-hosting
      Open source version
      Optional storage cache
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Feed any article or web page to a language model as clean text by prepending r.jina.ai/ to its URL with no account or API key needed.

USE CASE 2

Convert a PDF hosted online into structured markdown so an AI model can answer questions about its content.

USE CASE 3

Build a web-research pipeline that fetches the full text of top search results for any query using the s.jina.ai companion endpoint.

Tech stack

TypeScriptNode.jsheadless Chrome

Getting it running

Difficulty · easy Time to first run · 5min

In plain English

Reader is an open-source tool from Jina AI that turns any web page, PDF, or document into clean text that AI models can work with. The idea is simple: prepend https://r.jina.ai/ to any URL, and Reader fetches the page, strips away the noise, and returns structured markdown. You do not need an account or API key to start using it. The tool handles more than just web pages. PDFs hosted anywhere are parsed automatically. Word, Excel, and PowerPoint files can be uploaded directly or linked by URL. Images get a short text caption so that AI models without vision support can still reason about them. Under the hood, Reader picks between headless Chrome (for JavaScript-heavy pages) and a lightweight curl-based fetcher, choosing whichever is more appropriate for the page. The search side of the project works through a companion service at s.jina.ai. Pass it any query and it fetches the top five web results, visits each one, and returns their full text rather than just titles and snippets. That means the AI model reading those results gets real article content, not search-engine previews. For developers who want more control, Reader accepts request headers that adjust its behavior: you can target specific elements on a page with a CSS selector, set a timeout, cap the number of output tokens, or choose whether to use the browser renderer or the lightweight fetcher. There is an interactive code builder on the project website that lets you explore the available options before writing any code. This repository is the open-source version of the same code running on the live service. The hosted SaaS adds a storage layer that is not included here, but you can run Reader locally in a stateless mode or with optional object-storage caching.

Copy-paste prompts

Prompt 1
Using Jina Reader, how do I convert a web article into clean markdown I can paste into Claude or ChatGPT? Show me the exact URL format to use.
Prompt 2
I want to self-host Jina Reader to convert internal documents to AI-readable text without sending them to a third-party service. How do I run it locally?
Prompt 3
How do I use Jina Reader request headers to target only the main article body of a page and ignore the navigation menu and footer?
Prompt 4
Using Jina Reader's search endpoint at s.jina.ai, how do I get the full text of the top five results for a query and feed them into an AI pipeline?
Open on GitHub → Explain another repo

← jina-ai on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.