chaos-jetzt-nixfiles

Author	SHA1	Message	Date
Moritz 'e1mo' Fromm	ef147a0e22	services/monitoring: Tie up loose ends Some variables that were intendet to be used were in fact not used (e.g. allTargets) but that will be needed as soon as we have a second non-dev host in our nixfiles.	2023-08-04 16:39:11 +02:00
Moritz 'e1mo' Fromm	047d73dc78	Add cj.deployment module That way we can configure the depployment tags and everything in a single location.	2023-08-04 16:39:10 +02:00
adb-sh	6c1e6d5811	Update email and ssh key from adb	2023-02-11 22:10:44 +01:00
Moritz 'e1mo' Fromm	935f51e7d9	services/monitoring: Fix missing firewall rule I didn't notice this was missing in #5 until after deploying it. Since the ports on the monitoring-network-interface (ens10) were not open, scraping would fail and thus generate alerts.	2023-01-06 16:07:46 +01:00
Moritz 'e1mo' Fromm	d199834a61	Add adb and admin htpasswd user Also updated instructions for editing the .htpasswd	2023-01-06 15:51:22 +01:00
Moritz 'e1mo' Fromm	3acc1865c0	services/monitoring: Setup The goal is to create a monitoring setup where each server monitors itself when it comes failing systemd services, disk or RAM filling up, …. In addition each prometheus will monitor remote prometheus and alertmanager instances for signs of failure (e.g. being unreachable, errors in notification delivery, dropping alerts). A lot of metrics (especially histograms from prometheus or alertmanager) are being dropped before ingestion to disk save on space and memory. Depending on how many servers we may or may not have in the future this could probably use some kind of overhaul since we rightnow have n^2 monitoring peer relationships (not even speaking of possible duplicated alerts).	2023-01-06 15:51:22 +01:00

6 commits