What is Nagios?
Who uses Nagios?
Why developers like Nagios?
Here are some stack decisions, common use cases and reviews by companies and developers who chose Nagios in their tech stack.
Why we spent several years building an open source, large-scale metrics alerting system, M3, built for Prometheus:
By late 2014, all services, infrastructure, and servers at Uber emitted metrics to a Graphite stack that stored them using the Whisper file format in a sharded Carbon cluster. We used Grafana for dashboarding and Nagios for alerting, issuing Graphite threshold checks via source-controlled scripts. While this worked for a while, expanding the Carbon cluster required a manual resharding process and, due to lack of replication, any single node’s disk failure caused permanent loss of its associated metrics. In short, this solution was not able to meet our needs as the company continued to grow.
To ensure the scalability of Uber’s metrics backend, we decided to build out a system that provided fault tolerant metrics ingestion, storage, and querying as a managed platform...
(GitHub : https://github.com/m3db/m3)
We use Nagios to monitor our stack and alert us when problems arise. Nagios allows us to monitor every aspect of each of our servers such as running processes, CPU usage, disk usage, and more. This means that as soon as problems arise, we can detect them and call out an engineer to resolve the issues as soon as possible. Nagios
We use nagios based OpsView to monitor our server farm and keep everything running smoothly. Nagios
We use Nagios to monitor customer instances of Bridge and proactively alert us about issues like queue sizes, downed services, errors in logs, etc. Nagios
Each piece of our infrastructure is monitored using Nagios, alerting us immediately if anything goes wrong (hopefully before anyone else notices), and with a level of granularity that really helps in resolving things quickly when things are on fire. Nagios
Jobs that mention Nagios as a desired skillset
- Monitor your entire IT infrastructure
- Spot problems before they occur
- Know immediately when problems arise
- Share availability data with stakeholders
- Detect security breaches
- Plan and budget for IT upgrades
- Reduce downtime and business losses