Set up alert rules that watch Prometheus or VictoriaMetrics data and send SMS or Slack messages when thresholds are crossed
Run distributed alerting at a remote site that keeps firing alerts even when the network link to the main cluster goes down
Route different alerts to specific teams using subscription rules, and silence noisy alerts automatically during maintenance windows
Trigger a remediation script automatically when an alert fires, such as clearing disk space when a disk-full condition is detected
Requires an existing metrics backend such as Prometheus or VictoriaMetrics already running, data collection needs the separate Categraf agent.
Nightingale is an open-source monitoring and alerting tool. The README compares it to Grafana: where Grafana puts its energy into charts and dashboards, Nightingale puts its energy into the alerting engine and the rules that control how alerts get sent to people. The project was originally built by DiDi and was donated to the China Computer Federation open-source committee in 2022. Nightingale does not collect monitoring data itself. Instead, it connects to data stores that already hold the metrics or logs, such as Prometheus, VictoriaMetrics, Elasticsearch, Loki, ClickHouse, or several relational databases. Once connected, teams configure alert rules inside Nightingale, and it watches those data sources and fires notifications when conditions are met. The companion tool Categraf, maintained separately, handles the actual collection from operating systems, network devices, and databases. Notification delivery covers 20 built-in channels including phone calls, SMS, email, DingTalk, Slack, and others. Teams can define mute rules to suppress noise during maintenance, subscription rules to route alerts to the right people, and message templates to control what the notification looks like. An event pipeline lets you append metadata to alerts or trigger automatic remediation scripts, for example clearing disk space when a disk-full alert fires. For teams that operate remote sites with unreliable network connections to the central server, Nightingale supports a distributed alerting mode. A lightweight component called n9e-edge can run on-site and keep firing alerts even if the network link to the main cluster is down. The README explicitly notes where Nightingale is not the right fit: if a team needs on-call scheduling, escalation policies, or unified noise reduction across many monitoring systems, it suggests looking at purpose-built on-call products instead. Nightingale is licensed under the Apache License 2.0.
← ccfos on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.