explaingit

soxoj/kronikier

13PythonAudience · researcherComplexity · 3/5LicenseSetup · easy

TLDR

CLI tool that extracts historical email and phone numbers from archived website snapshots via the Wayback Machine, with timeline data and CSV export.

Mindmap

mindmap
  root((kronikier))
    Data Recovery
      Wayback Machine
      Historical contacts
      Timeline tracking
    Input Modes
      Single URL
      Batch processing
      Exhaustive scan
    Output Format
      Terminal table
      CSV export
      Normalized data
    Performance
      Request rate limit
      Disk caching
      Polite crawling
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Recover contact information for websites that have been taken down or significantly changed

USE CASE 2

Research historical business ownership and location changes via archived contact details

USE CASE 3

Investigate domain history during legal investigations by extracting documented public records

USE CASE 4

Track when and how contact information changed over time for compliance or audit purposes

Tech stack

PythonWayback Machine APICSV exportCommand-line interface

Getting it running

Difficulty · easy Time to first run · 5min

Install via pip, run with a domain name. Requires internet access to query Wayback Machine. Caches snapshots locally by default, uses standard CSV output to current directory.

MIT License, open source, permissive, allows commercial and private use with attribution.

In plain English

kronikier is a Python command-line tool for recovering historical contact information from archived versions of websites. It queries the Wayback Machine (web.archive.org), which preserves snapshots of web pages over time, and extracts email addresses and phone numbers that appeared on a given domain across all available snapshots. The main use case is research or investigation where a website has since removed its contact details, been taken down, or changed ownership. You give it a domain name and it works through the available snapshots, pulling out any email or phone number it finds. Results include when each contact first appeared and when it was last seen, so you get a timeline rather than just a list. Output goes to a summary table in the terminal and a CSV file saved to the current directory. The CSV includes nine columns, with both a normalized version of each contact and the raw text as it appeared on the page, which matters for phone numbers where reformatting can introduce errors. The tool tries to behave politely toward the Internet Archive. It defaults to four requests per second and caches downloaded snapshots on disk so that rerunning a scan does not re-fetch content you already have. The cache lives in a folder in your home directory and can be disabled or cleared through flags. For investigations, you can scan a single URL to see how one page changed over time, run an exhaustive scan that covers every URL the host ever had, or feed in a file of multiple targets for batch processing. The README notes that all data extracted is already public, that contacts found in old snapshots may no longer be associated with the domain, and that the tool is intended for use within legal investigations. The test suite uses the Theranos and Enron domains as live regression tests, since their archived contacts are documented historical record. The license is MIT.

Copy-paste prompts

Prompt 1
Extract all email addresses and phone numbers that ever appeared on example.com according to Wayback Machine snapshots
Prompt 2
Run a batch scan of these three domains and give me a CSV with contact timelines for each
Prompt 3
What contact information did this domain have in 2015 vs 2020 according to archived snapshots?
Prompt 4
Set up kronikier with a 2-second request rate and rescan this domain without using my cached snapshots
Open on GitHub → Explain another repo

← soxoj on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.