explaingit

sophomoresty/bpc-fetch

104JavaScriptAudience · developerComplexity · 2/5Setup · moderate

TLDR

A command-line tool that fetches full text from 930+ paywalled news sites and saves articles as clean Markdown, using headless browser and fallback strategies.

Mindmap

mindmap
  root((bpc-fetch))
    What it does
      Bypasses paywalls
      Fetches full articles
      Saves as Markdown
    Coverage
      936 news sites
      40 countries
      Finance and science
    How it works
      Googlebot spoofing
      Referer manipulation
      Archive.org fallback
    Usage
      CLI tool
      JSON output
      Windows exe
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Download the full text of a paywalled article from the NYT or Financial Times as Markdown for research or AI summarization

USE CASE 2

Batch-fetch recent articles from a news site by scraping its RSS feed or sitemap with a single command

USE CASE 3

Run a cross-site keyword search across multiple publications and download matching articles in one batch

USE CASE 4

Pipe article JSON output into an AI processing pipeline using bpc-fetch's structured stdout format

Tech stack

PythonPlaywrightChromiumPyInstaller

Getting it running

Difficulty · moderate Time to first run · 30min

Requires pip install plus Playwright and Chromium. Windows exe available that auto-downloads Chromium.

In plain English

bpc-fetch is a command-line tool that retrieves full articles from more than 930 news and magazine websites that require paid subscriptions, then saves each article as clean Markdown text with its images preserved. The project was built to replicate what the browser extension Bypass Paywalls Clean does, but runs entirely from a terminal without needing a browser extension installed. The tool supports sources across more than 40 countries, including publications in the financial press such as the Economist, Financial Times, Bloomberg, and the Wall Street Journal, US news sites including the New York Times and the Washington Post, European publications in German, French, Italian, and Spanish, and science and technology outlets including Wired, Nature, and MIT Technology Review. The full supported list covers 936 sites and can be viewed by running a command in the terminal. For each article URL you provide, the tool tries a sequence of retrieval strategies: impersonating Googlebot or Bingbot, manipulating the HTTP referer header, running JavaScript inside a headless Chromium browser window, or falling back to the Internet Archive if the direct approach does not work. The tool picks the best strategy for each site and degrades through the chain until it gets content. Beyond single articles, bpc-fetch can discover recent articles from a site by checking its RSS feed, XML sitemap, or rendered homepage. A cross-site search mode lets you search with a keyword, filter by time range, and download matching articles in one batch. Output is designed for automated pipelines: results go to standard output as JSON, progress messages go to standard error, and each response includes a suggestion for the next command to run. Installation uses pip and requires Playwright and Chromium installed alongside it. A self-contained Windows executable is also available in Releases and downloads Chromium automatically on first run. The repository does not state a license clearly in the README.

Copy-paste prompts

Prompt 1
Using bpc-fetch, write a Python script that searches The Economist for articles about climate policy from the last 30 days and saves each one as a separate Markdown file named after the article title.
Prompt 2
I have a list of 20 paywalled article URLs from different news sites. Write a shell script using bpc-fetch to download each one as Markdown and handle failures gracefully.
Prompt 3
How do I use bpc-fetch in pipe mode to feed article content into a Python script that summarizes it with an LLM? Show a complete example pipeline from URL to summary.
Prompt 4
How do I check if bpc-fetch supports a specific news site, and what do I do if the site is not in the list of 936 supported sources?
Open on GitHub → Explain another repo

← sophomoresty on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.