explaingit

laramies/theharvester

16,199PythonAudience · developerComplexity · 3/5Setup · moderate

TLDR

A command-line OSINT tool that collects publicly available information about a domain, emails, subdomains, IPs, from dozens of public sources, for use in penetration testing and security assessments.

Mindmap

mindmap
  root((theHarvester))
    What it does
      Collect emails
      Find subdomains
      Discover IPs and URLs
    Data Sources
      Search engines
      Cert transparency
      Security search engines
      Breach databases
    Features
      Passive recon
      Active brute-force
      REST API
    Audience
      Pen testers
      Blue team defenders
      Security researchers
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Map all subdomains and email addresses exposed for a target organization before a penetration test.

USE CASE 2

Audit what information about your own company is publicly visible to outside attackers.

USE CASE 3

Automate reconnaissance by integrating theHarvester via its REST API into a larger security pipeline.

Tech stack

PythonDockeruv

Getting it running

Difficulty · moderate Time to first run · 30min

Several of the most powerful data sources require API keys from third-party providers, the free tier gives limited requests per day.

License terms were not mentioned in the explanation.

In plain English

theHarvester is a reconnaissance tool used in the early "information-gathering" stage of a penetration test or red-team assessment. Its job is to collect publicly available information about a given domain, names, email addresses, IP addresses, subdomains, and URLs, so a security team can see what an outside attacker would be able to find about their organisation. This is called OSINT, short for open-source intelligence, because everything is pulled from public resources. The tool runs as a command-line program. You give it a domain to target, and it then queries a long list of "passive" data sources in turn, public search engines like Baidu, Brave, DuckDuckGo, Mojeek and Yahoo, certificate transparency logs through crt.sh and Cert Spotter, security-focused search engines like Shodan, Censys, Netlas, FOFA, ZoomEye and SecurityTrails, breach-checking services like haveibeenpwned and DeHashed, and email-finder services like Hunter and RocketReach, among many others. Some sources are free, others need an API key, and the README lists the free quotas and paid tiers. On top of that, "active" modules can brute-force subdomain names from a dictionary and take screenshots of discovered subdomains. You would reach for theHarvester if you are a penetration tester scoping out a target's external attack surface, a blue-team defender wanting to see what is exposed about your own organisation, or a security researcher doing reconnaissance. An optional REST API allows the tool to be integrated with other systems, protected by an API key. It is written in Python (3.12 or higher) and uses the uv package manager for installation. It can also be run from a prebuilt Docker image.

Copy-paste prompts

Prompt 1
Using theHarvester, write me a shell command to gather subdomains and emails for the domain example.com using Shodan and crt.sh as sources.
Prompt 2
I want to run theHarvester from Docker against my own domain to audit our external exposure, walk me through the setup steps.
Prompt 3
How do I configure API keys for theHarvester paid data sources like SecurityTrails and Hunter so they load automatically on each run?
Prompt 4
Generate a Python script that calls theHarvester REST API to run a scan and parse the JSON results into a CSV report.
Open on GitHub → Explain another repo

← laramies on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.