What is Ganglia?
What is Hund?
Need advice about which tool to choose?Ask the StackShare community!
Why do developers choose Ganglia?
Why do developers choose Hund?
Sign up to add, upvote and see more prosMake informed product decisions
What are the cons of using Ganglia?
What are the cons of using Hund?
Sign up to get full access to all the companiesMake informed product decisions
Sign up to get full access to all the tool integrationsMake informed product decisions
Why we spent several years building an open source, large-scale metrics alerting system, M3, built for Prometheus:
By late 2014, all services, infrastructure, and servers at Uber emitted metrics to a Graphite stack that stored them using the Whisper file format in a sharded Carbon cluster. We used Grafana for dashboarding and Nagios for alerting, issuing Graphite threshold checks via source-controlled scripts. While this worked for a while, expanding the Carbon cluster required a manual resharding process and, due to lack of replication, any single node’s disk failure caused permanent loss of its associated metrics. In short, this solution was not able to meet our needs as the company continued to grow.
To ensure the scalability of Uber’s metrics backend, we decided to build out a system that provided fault tolerant metrics ingestion, storage, and querying as a managed platform...
(GitHub : https://github.com/m3db/m3)
We use Nagios to monitor our stack and alert us when problems arise. Nagios allows us to monitor every aspect of each of our servers such as running processes, CPU usage, disk usage, and more. This means that as soon as problems arise, we can detect them and call out an engineer to resolve the issues as soon as possible.
We use Nagios to monitor customer instances of Bridge and proactively alert us about issues like queue sizes, downed services, errors in logs, etc.
We use nagios based OpsView to monitor our server farm and keep everything running smoothly.