explaingit

keephq/keep

11,822PythonAudience · ops devopsComplexity · 3/5Setup · moderate

TLDR

An open-source alert management platform that pulls alerts from Datadog, Grafana, PagerDuty, and other tools into one dashboard, with deduplication, AI correlation, and automated workflows that trigger Slack messages or Jira tickets.

Mindmap

mindmap
  root((keep))
    What it does
      Alert aggregation
      Deduplication
      Correlation
    Integrations
      Datadog Grafana
      PagerDuty
      Slack and Jira
    AI Features
      Incident summary
      Root cause grouping
    Workflow Engine
      Trigger on alert
      Auto-remediation
    Use Cases
      On-call reduction
      Incident response
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Aggregate alerts from Datadog, Grafana, and PagerDuty into one dashboard so on-call engineers stop switching between tools.

USE CASE 2

Set up a workflow that automatically posts to Slack and opens a Jira ticket when a high-severity alert fires.

USE CASE 3

Deduplicate and correlate multiple related alerts about the same outage into a single incident view to reduce noise.

USE CASE 4

Use AI integration to automatically summarize an active incident and gather context from linked monitoring tools.

Tech stack

PythonAnthropicOpenAIOllamaGemini

Getting it running

Difficulty · moderate Time to first run · 30min

Needs API keys for each monitoring tool you want to connect, plus an AI provider key if you want AI correlation features.

In plain English

Keep is an open-source platform for managing alerts from multiple monitoring tools in one place. Operations teams typically use many different monitoring services such as Datadog, Grafana, PagerDuty, and CloudWatch, each sending their own alerts. Keep aggregates all of these into a single interface so engineers can see and respond to everything without switching between tools. The core features include alert deduplication (combining multiple alerts about the same issue into one), correlation (grouping related alerts together so an incident with five symptoms appears as one cluster rather than five separate notifications), enrichment (adding extra context to alerts automatically), and filtering. Alerts can be acknowledged, snoozed, or routed to the right team through a customizable dashboard. Keep also includes a workflow engine, described by the project as "GitHub Actions for your monitoring tools." Workflows are automated sequences that trigger when certain alert conditions are met, such as sending a message to Slack, creating a ticket in Jira, or running a script to restart a failing service. The integrations are bidirectional, meaning Keep can both receive alerts from external tools and push actions back to them. AI features are built in through connections to several AI providers, including Anthropic, OpenAI, Gemini, DeepSeek, and local models via Ollama. These are used for tasks like summarizing an incident, correlating alerts that share a root cause, or gathering additional context automatically when an alert fires. Keep is written in Python. A hosted version is available at platform.keephq.dev for trying it out, and documentation is at docs.keephq.dev. The full README is longer than what was shown.

Copy-paste prompts

Prompt 1
Set up Keep to ingest alerts from my Datadog account and deduplicate them, show me the integration config and where to put my API key.
Prompt 2
Write a Keep workflow definition that triggers when a high-severity Grafana alert fires: sends a Slack message and opens a PagerDuty incident.
Prompt 3
How do I configure Keep to use Anthropic Claude for alert correlation, and what does the AI receive as input when grouping related alerts?
Prompt 4
Deploy Keep locally with Docker Compose and connect it to an existing Prometheus setup, show me the full config.
Prompt 5
Create a Keep workflow that restarts a failing service via a shell script when a specific alert fires, then auto-acknowledges the alert.
Open on GitHub → Explain another repo

← keephq on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.