explaingit

s0md3v/photon

12,875PythonAudience · ops devopsComplexity · 2/5LicenseSetup · easy

TLDR

A fast Python web crawler built for OSINT that automatically extracts URLs, email addresses, API keys, JavaScript endpoints, subdomains, and DNS data from a target website.

Mindmap

mindmap
  root((Photon))
    Extracts
      URLs and links
      Email addresses
      API keys
      JS endpoints
    Features
      Wayback Machine
      DNS data
      JSON export
    Config
      Request delays
      URL filters
      Docker support
    Plugins
      dnsdumpster
      Wayback
      Exporter
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Crawl a target website to automatically collect all email addresses and external links for OSINT research.

USE CASE 2

Use the --wayback option to pull historical URLs from archive.org and find pages no longer linked on the live site.

USE CASE 3

Scan JavaScript files on a site to extract hidden API endpoints and exposed authentication tokens.

USE CASE 4

Run Photon in Docker to collect DNS and subdomain data from a target domain without installing dependencies.

Tech stack

PythonDocker

Getting it running

Difficulty · easy Time to first run · 5min
Free to use and modify under GPL v3, any modifications must also be released as open source under the same license.

In plain English

Photon is a fast web crawler written in Python, built for OSINT (open-source intelligence gathering). You point it at a website, and it automatically follows links and collects information that might be useful for security research or reconnaissance. The tool extracts a wide range of data during a crawl: URLs (both on the target site and linked externally), URLs that contain query parameters, email addresses and social media account references, files such as PDFs and images, API keys or authentication tokens left exposed in page source, JavaScript files and the API endpoints buried inside them, subdomain references, and DNS-related data. Results are saved in an organized folder structure and can also be exported as JSON. A useful feature is the ability to pull historical URLs from archive.org using the --wayback option. This means Photon can start crawling a site using a list of pages that were captured in the past, which sometimes surfaces content that is no longer linked from the live site. Several plugins extend the core functionality: one integrates with dnsdumpster for DNS data, another connects to the wayback machine, and an exporter handles formatted output. The tool can also be run inside a Docker container if you prefer not to install its dependencies directly. Photon can be configured in detail: you can set timeouts, add artificial delays between requests, provide starting seed URLs, or filter out URLs that match a pattern you want to skip. The project is licensed under GPL v3.

Copy-paste prompts

Prompt 1
Give me the Photon command to crawl example.com, extract all emails and external URLs, and export the results as JSON.
Prompt 2
How do I use Photon's --wayback flag to find historical URLs for a domain that may have removed old pages?
Prompt 3
I want to use Photon to find exposed API keys in JavaScript files on a target site. What command and flags should I use?
Prompt 4
How do I run Photon inside Docker and configure it to skip URLs matching a specific pattern?
Prompt 5
Show me how to set a delay between Photon requests to avoid triggering rate limiting on the target site.
Open on GitHub → Explain another repo

← s0md3v on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.