explaingit

ccfos/nightingale

13,017GoAudience · ops devopsComplexity · 4/5LicenseSetup · hard

TLDR

An open-source alerting platform that connects to your existing data stores like Prometheus or Elasticsearch, watches for problems, and fires notifications through 20+ channels including SMS, phone, Slack, and email.

Mindmap

mindmap
  root((nightingale))
    What it does
      Alerting engine
      Rule management
      Notification routing
    Data Sources
      Prometheus
      VictoriaMetrics
      Elasticsearch
      ClickHouse
    Notifications
      Slack and email
      SMS and phone
      DingTalk
    Features
      Mute rules
      Event pipeline
      Distributed mode
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Set up alert rules that watch Prometheus or VictoriaMetrics data and send SMS or Slack messages when thresholds are crossed

USE CASE 2

Run distributed alerting at a remote site that keeps firing alerts even when the network link to the main cluster goes down

USE CASE 3

Route different alerts to specific teams using subscription rules, and silence noisy alerts automatically during maintenance windows

USE CASE 4

Trigger a remediation script automatically when an alert fires, such as clearing disk space when a disk-full condition is detected

Tech stack

Go

Getting it running

Difficulty · hard Time to first run · 1day+

Requires an existing metrics backend such as Prometheus or VictoriaMetrics already running, data collection needs the separate Categraf agent.

Apache 2.0: use, modify, and distribute freely for any purpose including commercial, as long as you include the Apache license notice.

In plain English

Nightingale is an open-source monitoring and alerting tool. The README compares it to Grafana: where Grafana puts its energy into charts and dashboards, Nightingale puts its energy into the alerting engine and the rules that control how alerts get sent to people. The project was originally built by DiDi and was donated to the China Computer Federation open-source committee in 2022. Nightingale does not collect monitoring data itself. Instead, it connects to data stores that already hold the metrics or logs, such as Prometheus, VictoriaMetrics, Elasticsearch, Loki, ClickHouse, or several relational databases. Once connected, teams configure alert rules inside Nightingale, and it watches those data sources and fires notifications when conditions are met. The companion tool Categraf, maintained separately, handles the actual collection from operating systems, network devices, and databases. Notification delivery covers 20 built-in channels including phone calls, SMS, email, DingTalk, Slack, and others. Teams can define mute rules to suppress noise during maintenance, subscription rules to route alerts to the right people, and message templates to control what the notification looks like. An event pipeline lets you append metadata to alerts or trigger automatic remediation scripts, for example clearing disk space when a disk-full alert fires. For teams that operate remote sites with unreliable network connections to the central server, Nightingale supports a distributed alerting mode. A lightweight component called n9e-edge can run on-site and keep firing alerts even if the network link to the main cluster is down. The README explicitly notes where Nightingale is not the right fit: if a team needs on-call scheduling, escalation policies, or unified noise reduction across many monitoring systems, it suggests looking at purpose-built on-call products instead. Nightingale is licensed under the Apache License 2.0.

Copy-paste prompts

Prompt 1
Write a Nightingale alert rule that fires when CPU usage exceeds 90% for 5 minutes and sends a Slack notification
Prompt 2
How do I connect Nightingale to a VictoriaMetrics datasource and create my first disk-space alert rule?
Prompt 3
Set up a Nightingale mute rule that suppresses all alerts every Saturday between 2am and 4am during a maintenance window
Prompt 4
Show me how to configure n9e-edge for a remote site with an unreliable connection to the main Nightingale cluster
Prompt 5
Write a Nightingale event pipeline action that runs a shell script to free disk space when a disk-full alert fires
Open on GitHub → Explain another repo

← ccfos on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.