Graphite vs Nagios: What are the differences?
Graphite: A highly scalable real-time graphing system. Graphite does two things: 1) Store numeric time-series data and 2) Render graphs of this data on demand; Nagios: Complete monitoring and alerting for servers, switches, applications, and services. Nagios is a host/service/network monitoring program written in C and released under the GNU General Public License.
Graphite and Nagios belong to "Monitoring Tools" category of the tech stack.
Some of the features offered by Graphite are:
- carbon - a Twisted daemon that listens for time-series data
- whisper - a simple database library for storing time-series data (similar in design to RRD)
- graphite webapp - A Django webapp that renders graphs on-demand using Cairo
On the other hand, Nagios provides the following key features:
- Monitor your entire IT infrastructure
- Spot problems before they occur
- Know immediately when problems arise
"Render any graph" is the top reason why over 14 developers like Graphite, while over 49 developers mention "It just works" as the leading cause for choosing Nagios.
Graphite and Nagios are both open source tools. Graphite with 4.59K GitHub stars and 1.2K forks on GitHub appears to be more popular than Nagios with 60 GitHub stars and 36 GitHub forks.
According to the StackShare community, Nagios has a broader approval, being mentioned in 177 company stacks & 40 developers stacks; compared to Graphite, which is listed in 97 company stacks and 21 developer stacks.
What is Graphite?
What is Nagios?
Need advice about which tool to choose?Ask the StackShare community!
Sign up to add, upvote and see more prosMake informed product decisions
What are the cons of using Graphite?
What are the cons of using Nagios?
Sign up to get full access to all the companiesMake informed product decisions
Sign up to get full access to all the tool integrationsMake informed product decisions
One size definitely doesn’t fit all when it comes to open source monitoring solutions, and executing generally understood best practices in the context of unique distributed systems presents all sorts of problems. Megan Anctil, a senior engineer on the Technical Operations team at Slack gave a talk at an O’Reilly Velocity Conference sharing pain points and lessons learned at wrangling known technologies such as Icinga, Graphite, Grafana, and the Elastic Stack to best fit the company’s use cases.
At the time, Slack used a few well-known monitoring tools since it’s Technical Operations team wasn’t large enough to build an in-house solution for all of these. Nor did the team think it’s sustainable to throw money at the problem, given the volume of information processed and the not-insignificant price and rigidity of many vendor solutions. With thousands of servers across multiple regions and millions of metrics and documents being processed and indexed per second, the team had to figure out how to scale these technologies to fit Slack’s needs.
On the backend, they experimented with multiple clusters in both Graphite and ELK, distributed Icinga nodes, and more. At the same time, they’ve tried to build usability into Grafana that reflects the team’s mental models of the system and have found ways to make alerts from Icinga more insightful and actionable.
Why we spent several years building an open source, large-scale metrics alerting system, M3, built for Prometheus:
By late 2014, all services, infrastructure, and servers at Uber emitted metrics to a Graphite stack that stored them using the Whisper file format in a sharded Carbon cluster. We used Grafana for dashboarding and Nagios for alerting, issuing Graphite threshold checks via source-controlled scripts. While this worked for a while, expanding the Carbon cluster required a manual resharding process and, due to lack of replication, any single node’s disk failure caused permanent loss of its associated metrics. In short, this solution was not able to meet our needs as the company continued to grow.
To ensure the scalability of Uber’s metrics backend, we decided to build out a system that provided fault tolerant metrics ingestion, storage, and querying as a managed platform...
(GitHub : https://github.com/m3db/m3)
A huge part of our continuous deployment practices is to have granular alerting and monitoring across the platform. To do this, we run Sentry on-premise, inside our VPCs, for our event alerting, and we run an awesome observability and monitoring system consisting of StatsD, Graphite and Grafana. We have dashboards using this system to monitor our core subsystems so that we can know the health of any given subsystem at any moment. This system ties into our PagerDuty rotation, as well as alerts from some of our Amazon CloudWatch alarms (we’re looking to migrate all of these to our internal monitoring system soon).
We use Nagios to monitor our stack and alert us when problems arise. Nagios allows us to monitor every aspect of each of our servers such as running processes, CPU usage, disk usage, and more. This means that as soon as problems arise, we can detect them and call out an engineer to resolve the issues as soon as possible.
Utilizando computação em nuvens e o modelo de pagar pelo uso com _graphite _nós conseguimos analisar todos os logs de informação gerada pelo sistema.
We use Nagios to monitor customer instances of Bridge and proactively alert us about issues like queue sizes, downed services, errors in logs, etc.
We use nagios based OpsView to monitor our server farm and keep everything running smoothly.