explaingit

thanos-io/thanos

14,068GoAudience · ops devopsComplexity · 4/5Setup · hard

TLDR

Thanos extends Prometheus with unlimited long-term metric storage in cloud object storage, a global query view across multiple clusters, and high availability, without replacing your existing Prometheus setup.

Mindmap

mindmap
  root((Thanos))
    What it does
      Extends Prometheus
      Long-term storage
      Global query view
    Components
      Sidecar
      Store Gateway
      Querier
      Compactor
    Storage backends
      Amazon S3
      Google Cloud Storage
      Azure Blob
    Features
      Cross-cluster queries
      HA deduplication
      Data downsampling
    Integration
      Adds to existing Prometheus
      gRPC API
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Add long-term metric retention to an existing Prometheus setup by shipping data to S3 or Google Cloud Storage instead of running out of local disk.

USE CASE 2

Query metrics from multiple Prometheus instances across regions or clusters with a single PromQL query.

USE CASE 3

Run two identical Prometheus instances for redundancy and use Thanos to deduplicate their data so queries return complete results even when one goes down.

USE CASE 4

Downsample old Prometheus data to reduce storage costs and speed up queries that span months of history.

Tech stack

GoPrometheusgRPC

Getting it running

Difficulty · hard Time to first run · 1h+

Requires existing Prometheus setup plus access to an object storage bucket such as S3, Google Cloud Storage, or Azure Blob.

No license information is mentioned in the explanation.

In plain English

Thanos is a set of tools that extend Prometheus, a popular open-source monitoring system used to collect and query metrics from servers and services. Prometheus works well for a single machine or cluster, but it stores data locally, which means data can be lost if the machine goes down, storage fills up over time, and there is no built-in way to query metrics from multiple Prometheus instances at once. Thanos was built to solve all three problems. The main things Thanos adds are a global query view, unlimited long-term storage, and high availability. The global query view lets you send one query and have it reach all of your Prometheus servers across multiple clusters or regions, with results merged automatically. For storage, Thanos ships metric data from Prometheus into any object storage service (like Amazon S3, Google Cloud Storage, or Azure Blob Storage) where data can be kept indefinitely at low cost. Older data can also be downsampled to make queries over long time ranges faster. For high availability, teams sometimes run two identical Prometheus instances pointed at the same targets. Thanos can merge their data on the fly and remove duplicates, so if one instance fails, queries still return complete results. Thanos integrates with existing Prometheus setups by adding small sidecar components next to your current Prometheus servers. No major restructuring of your monitoring setup is required. The project supports cross-cluster federation, fault-tolerant query routing, and exposes a gRPC API that other tools can build on. Thanos is written in Go and is an incubating project at the Cloud Native Computing Foundation.

Copy-paste prompts

Prompt 1
I'm running Prometheus on Kubernetes and want to add Thanos Sidecar to ship metrics to an S3 bucket. Show me the sidecar deployment YAML and the Thanos Store configuration to query that historical data later.
Prompt 2
Set up Thanos Querier to merge results from two Prometheus instances in different namespaces and expose a single PromQL endpoint with deduplication enabled.
Prompt 3
My Prometheus storage fills up after 15 days. Walk me through adding Thanos Compactor to downsample data older than 40 days and delete it from local disk after shipping to Google Cloud Storage.
Prompt 4
Configure Thanos Ruler to evaluate PromQL recording and alerting rules against data stored in object storage so I can alert on historical data rather than just local Prometheus data.
Prompt 5
I have three Prometheus clusters in us-east-1, eu-west-1, and ap-southeast-1. Design a Thanos architecture that lets me run a single PromQL query across all three and store their data in one S3 bucket.
Open on GitHub → Explain another repo

← thanos-io on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.