What do y’all use to monitor many linux servers?

shootwhatsmyname@lemm.ee · edit-2 8 months ago

What do y’all use to monitor many linux servers?

𝒎𝒂𝒏𝒊𝒆𝒍@lemmy.ml · 9 months ago

Telegraf+influxdb+grafana is what I use at work, it is a multi purpose tool though, can be used to monitor EVERYTHING though

SGH@lemmy.ml · 9 months ago

While I use LibreNMS as it uses SNMP for monitoring (which is pretty much available everywhere), I don’t believe it has http alerts, but I know for a fact that it can send Telegram messages.

hindy@mbin.lovetux.net · 9 months ago

Hello,

I’m still using Nagios here. And for the availability of the services I’m using uptime-kuma (in a docker).

elucubra@sopuli.xyz · 9 months ago

Ages ago I used to use Webmin. I have no clue as how it stacks up to others nowadays.

static09@lemmy.world · 9 months ago

Check out Netdata or Zabbix.

utopiah@lemmy.ml · 9 months ago

send alerts via http request

On this specifically you might want to check ntfy as it’s quite easy to setup and can give you notifications on pretty much any device (including iOS) via your own infrastructure all the way down to basics e.g. SSE. That mean you can subscribe to a topic, e.g. servers per physical location, alert level, etc and only get the ones you need.

utopiah@lemmy.ml · 9 months ago

Node exporter, Prometheus and grafana

Otherwise much heavier but that’s also what I use.

MrPoopyButthole@lemmy.dbzer0.com · 9 months ago

same

ddh@lemmy.sdf.org · 9 months ago

I use my family. It has a simple volume based alert for when services are offline.

vfsh@lemmy.blahaj.zone · 9 months ago

It’ll even automatically configured variable alert volumes corresponding to the importance of the service!

ikidd@lemmy.world · 9 months ago

deleted by creator

fmstrat@lemmy.nowsci.com · 9 months ago

Until the UPS battery gets low and it beeps, and they look for a way to turn it off vs calling you. Yup.

CaptSpify@lemmy.today · 9 months ago

https://en.m.wikipedia.org/wiki/Nagios

tath@social.tath.link · 9 months ago

Zabbix is pretty quick and easy. Many different services built in for sending notifications, along with your own custom (including webhooks). Fully customizable dashboard as well so you can add whatever you want/need at a glance.

8adger@lemmy.world · 9 months ago

I stopped by to say the same thing. I use Zabbix to monitor everything

notabot@lemm.ee · 9 months ago

Nagios. It does depend on what you mean by monitor though. Nagios is good at telling you that “service A on host B” is down" but less useful for looking at things like performance trends. I particularly like being able to setup dependencies between services, so I get the alert for the root cause, and not all of the services that have gone down because of it.

RegalPotoo@lemmy.world · 9 months ago

Base ansible role installs Prometheus node exporter, configured with the text file collector
VM automations push DNS records so that the Prometheus dns-sd automatically discovers them
Ansible roles for add Cron jobs that generate metrics for specific systems and dump them for the text file collector
Grafana for dashboards
Karma as a UI in front of Prometheus alert manager

tetris11@lemmy.ml · 9 months ago

Cron jobs that generate metrics for specific systems and dump them for the text file collector

Details please

RegalPotoo@lemmy.world · 9 months ago

https://github.com/prometheus/node_exporter?tab=readme-ov-file#textfile-collector - which makes node exporter watch a specific directory for files that contain metrics, then re-export them back to the central Prometheus server
Some systems have their own metrics endpoints - instead of getting Prometheus to scrape these directly I set up a Cron job to curl these into files for node exporter - this means I don’t need extra config in Prometheus to find the endpoints, and don’t need to mess with firewall rules
Other systems don’t directly expose metrics in a format Prometheus can use - in this case I will write/find a script that can do the conversation, then either set it up to write the metrics file directly and run it on a Cron, or run it as a service and another Cron job to do the scrape

Toribor@corndog.social · 9 months ago

Any chance you’d be willing to share playbooks or point me toward any resources you used?

I use Ansible to manage config across all my workstations/servers but I haven’t gotten around to automating log shipping yet or aggregating system metrics.

Cysio@lemmygrad.ml · 9 months ago

Zabbix

LainTrain@lemmy.dbzer0.com · 9 months ago

Cockpit.

dkc@lemmy.world · 9 months ago

I’ve been really enjoying Cockpit as well.

corsicanguppy@lemmy.ca · 9 months ago

My cockpit experience has been unilaterally dreadful. I’m glad you’re getting value out of it.

LainTrain@lemmy.dbzer0.com · 9 months ago

How comes?

hobbsc@lemmy.sdf.org · edit-2 14 days ago

deleted by creator

cmc@discuss.tchncs.de · 9 months ago

You can monitor multiple machines via the host switcher menu at the top-left of the screen: Multiple Machines

Mora@pawb.social · 9 months ago

Beszel. Probably the easiest tool of all the mentioned in this thread.

https://github.com/henrygd/beszel

dan@upvote.au · 9 months ago

I’m working on making it easier to install on Debian systems by creating a Debian package (and eventually a repo): https://github.com/henrygd/beszel/pull/497

JustARegularNerd@aussie.zone · 9 months ago

Seconded. My only complaint (which this might already be a feature I haven’t found yet) is it doesn’t seem to support multiple drives. But yes, it is shit easy to set up and has a beautiful UI

Mora@pawb.social · 9 months ago

Totally possible:

https://beszel.dev/guide/additional-disks

JustARegularNerd@aussie.zone · 9 months ago

I no longer have any complaints about Beszel. Thank you!

shootwhatsmyname@lemm.ee · 8 months ago

Went with this one, thanks! Lots of other good recommendations in the thread too though

spicehoarder@lemm.ee · 9 months ago

Not exactly what you’re looking for, but I like using proxmox