As in, when I watched YouTube tutorials, I often see YouTubers have a small widget on their desktop giving them an overview of their ram usage, security level, etc. What apps do you all use to track this?

  • borouhin@alien.topB
    link
    fedilink
    English
    arrow-up
    0
    ·
    11 months ago

    Alerts are much more important than fancy dashboards. You won’t be staring at your dashboard 24/7 and you probably won’t be staring at it when bad things happen.

    Creating your alert set not easy. Ideally, every problem you encounter should be preceded by corresponding alert, and no alert should be false positive (require no action). So if you either have a problem without being alerted from your monitoring, or get an alert which requires no action - you should sit down and think carefully what should be changed in your alerts.

    As for tools - I recommend Prometheus+Grafana. No need for separate AletrManager, as many guides recommend, recent versions of Grafana have excellent built-in alerting. Don’t use those ready-to-use dashboards, start from scratch, you need to understand PromQL to set everything up efficiently. Start with a simple dashboard (and alerts!) just for generic server health (node exporter), then add exporters for your specific services, network devices (snmp), remote hosts (blackbox), SSL certs etc. etc. Then write your own exporters for what you haven’t found :)

    • AttitudeImportant585@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      11 months ago

      When you’ve got a lot of variables, especially when dealing with a distributed system, that importance leans the other way. Visualization and analytics are practically required to debug and tune large systems

    • io-x@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      11 months ago

      I was looking at loki+grafana. is prometheus a replacement for loki in this setup and is it preferred?

      • borouhin@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        11 months ago

        No, they serve different purposes. Loki is for logs, Prometheus is for metrics. Grafana helps to visualize data from both.

          • borouhin@alien.topB
            link
            fedilink
            English
            arrow-up
            1
            ·
            11 months ago

            InfluxDB is just a storage. If you have a service that saves metrics to InfluxDB (IIRC, Proxmox can do that), Grafana can read it from there. Grafana can aggregate data from many sources, Prometheus+Loki+InfluxDB+even queries to arbitrary JSON APIs etc.