hartator/wayback-machine-downloader

★ 5,875RubyAudience · developerComplexity · 1/5Setup · easy

Mindmap

mindmap
  root((repo))
    What It Does
      Downloads archived sites
      Recreates file structure
      Retrieves original files
    Input Options
      Website URL
      Time range filters
      File pattern filters
      Exact URL mode
    Performance
      Concurrency setting
      Snapshot page limit
      List-only mode
    Setup
      Ruby gem install
      Docker support
    Use Cases
      Site archival recovery
      Historical research
      Offline backup

mindmap root((repo)) What It Does Downloads archived sites Recreates file structure Retrieves original files Input Options Website URL Time range filters File pattern filters Exact URL mode Performance Concurrency setting Snapshot page limit List-only mode Setup Ruby gem install Docker support Use Cases Site archival recovery Historical research Offline backup

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Download a complete archived copy of an old website from the Internet Archive to your computer

USE CASE 2

Retrieve only specific file types from a site's archive using filename or regex filters

USE CASE 3

Preview what the Internet Archive holds for a site in JSON format before committing to a full download

USE CASE 4

Restore a site's historical file structure within a specific date range using timestamp flags

Tech stack

RubyDocker

Getting it running

Difficulty · easy Time to first run · 5min

Requires Ruby, optionally runs in Docker to skip the Ruby install entirely.

In plain English

Wayback Machine Downloader is a Ruby command-line tool that retrieves a full copy of a website from the Internet Archive, which stores snapshots of websites going back decades. You give it a web address and it downloads every file it can find for that site, recreating the original directory structure on your computer. The files saved are the originals, not reformatted versions, so links and URLs work the same way as they did on the live site. Installation requires Ruby and a single gem install command. Basic use is equally simple: run the command with the website URL and it places all downloaded files inside a websites/ folder named after the domain. By default, it grabs the most recent version of each file the archive holds. Several optional flags give you control over what gets downloaded. You can narrow the download to files captured within a specific time window using from and to timestamps, which appear in any Wayback Machine URL. You can also restrict downloads to files matching a string or a regular expression, or exclude files by the same method. Downloading a single page rather than a whole site is possible with the exact-url flag. Performance options include concurrency, which lets you download multiple files simultaneously to speed things up, and a snapshot-pages setting that controls how many pages of archive history the tool searches through. A list-only mode prints the files and their timestamps in JSON format without downloading anything, useful for inspecting what the archive holds before committing to a full download. The tool also runs inside Docker if you prefer not to install Ruby directly. By default it skips error pages and redirects, but an option exists to include those too. The README is practical and example-driven, covering each flag with a concrete command.

Copy-paste prompts

Prompt 1

Give me the wayback-machine-downloader command to download all files from example.com that were archived between January 2010 and December 2012.

Prompt 2

Show me how to use wayback-machine-downloader with a regular expression filter to download only .jpg and .png images from a site's archive.

Prompt 3

Write the command to run wayback-machine-downloader in list-only mode to inspect what files the Internet Archive holds for a domain, without downloading anything.

Prompt 4

How do I run wayback-machine-downloader using Docker so I don't need to install Ruby directly on my machine?

Prompt 5

Give me the wayback-machine-downloader command to download a single specific archived page rather than the entire site.

Open on GitHub → Explain another repo

← hartator on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.