chaos-jetzt-nixfiles

History

Moritz 'e1mo' Fromm 3acc1865c0 services/monitoring: Setup The goal is to create a monitoring setup where each server monitors itself when it comes failing systemd services, disk or RAM filling up, …. In addition each prometheus will monitor remote prometheus and alertmanager instances for signs of failure (e.g. being unreachable, errors in notification delivery, dropping alerts). A lot of metrics (especially histograms from prometheus or alertmanager) are being dropped before ingestion to disk save on space and memory. Depending on how many servers we may or may not have in the future this could probably use some kind of overhaul since we rightnow have n^2 monitoring peer relationships (not even speaking of possible duplicated alerts).	2023-01-06 15:51:22 +01:00
..
default.nix	services/monitoring: Setup	2023-01-06 15:51:22 +01:00
rules.yaml	services/monitoring: Setup	2023-01-06 15:51:22 +01:00

Moritz 'e1mo' Fromm 3acc1865c0

The goal is to create a monitoring setup where each server monitors
itself when it comes failing systemd services, disk or RAM filling up,
…. In addition each prometheus will monitor remote prometheus and
alertmanager instances for signs of failure (e.g. being unreachable,
errors in notification delivery, dropping alerts).

A lot of metrics (especially histograms from prometheus or alertmanager)
are being dropped before ingestion to disk save on space and memory.

Depending on how many servers we may or may not have in the future this
could probably use some kind of overhaul since we rightnow have n^2
monitoring peer relationships (not even speaking of possible duplicated
alerts).

2023-01-06 15:51:22 +01:00

default.nix

services/monitoring: Setup

2023-01-06 15:51:22 +01:00

rules.yaml

services/monitoring: Setup

2023-01-06 15:51:22 +01:00