Amazon CloudWatch vs Grafana: What are the differences?
Amazon CloudWatch: Monitor AWS resources and custom metrics generated by your applications and services. With Amazon CloudWatch, you gain system-wide visibility into resource utilization, application performance, and operational health. Programmatically retrieve your monitoring data, view graphs, and set alarms to help you troubleshoot, spot trends, and take automated action based on the state of your cloud environment; Grafana: Open source Graphite & InfluxDB Dashboard and Graph Editor. Grafana is a general purpose dashboard and graph composer. It's focused on providing rich ways to visualize time series metrics, mainly though graphs but supports other ways to visualize data through a pluggable panel architecture. It currently has rich support for for Graphite, InfluxDB and OpenTSDB. But supports other data sources via plugins.
Amazon CloudWatch can be classified as a tool in the "Cloud Monitoring" category, while Grafana is grouped under "Monitoring Tools".
Some of the features offered by Amazon CloudWatch are:
- Basic Monitoring for Amazon EC2 instances: ten pre-selected metrics at five-minute frequency, free of charge.
- Detailed Monitoring for Amazon EC2 instances: seven pre-selected metrics at one-minute frequency, for an additional charge.
- Amazon EBS volumes: eight pre-selected metrics at five-minute frequency, free of charge.
On the other hand, Grafana provides the following key features:
- Create, edit, save & search dashboards
- Change column spans and row heights
- Drag and drop panels to rearrange
"Monitor aws resources" is the top reason why over 70 developers like Amazon CloudWatch, while over 65 developers mention "Beautiful" as the leading cause for choosing Grafana.
Grafana is an open source tool with 29.7K GitHub stars and 5.63K GitHub forks. Here's a link to Grafana's open source repository on GitHub.
According to the StackShare community, Amazon CloudWatch has a broader approval, being mentioned in 721 company stacks & 334 developers stacks; compared to Grafana, which is listed in 577 company stacks and 325 developer stacks.
What is Amazon CloudWatch?
What is Grafana?
Need advice about which tool to choose?Ask the StackShare community!
Sign up to add, upvote and see more prosMake informed product decisions
What are the cons of using Amazon CloudWatch?
What are the cons of using Grafana?
Sign up to get full access to all the companiesMake informed product decisions
Sign up to get full access to all the tool integrationsMake informed product decisions
One size definitely doesn’t fit all when it comes to open source monitoring solutions, and executing generally understood best practices in the context of unique distributed systems presents all sorts of problems. Megan Anctil, a senior engineer on the Technical Operations team at Slack gave a talk at an O’Reilly Velocity Conference sharing pain points and lessons learned at wrangling known technologies such as Icinga, Graphite, Grafana, and the Elastic Stack to best fit the company’s use cases.
At the time, Slack used a few well-known monitoring tools since it’s Technical Operations team wasn’t large enough to build an in-house solution for all of these. Nor did the team think it’s sustainable to throw money at the problem, given the volume of information processed and the not-insignificant price and rigidity of many vendor solutions. With thousands of servers across multiple regions and millions of metrics and documents being processed and indexed per second, the team had to figure out how to scale these technologies to fit Slack’s needs.
On the backend, they experimented with multiple clusters in both Graphite and ELK, distributed Icinga nodes, and more. At the same time, they’ve tried to build usability into Grafana that reflects the team’s mental models of the system and have found ways to make alerts from Icinga more insightful and actionable.
Why we spent several years building an open source, large-scale metrics alerting system, M3, built for Prometheus:
By late 2014, all services, infrastructure, and servers at Uber emitted metrics to a Graphite stack that stored them using the Whisper file format in a sharded Carbon cluster. We used Grafana for dashboarding and Nagios for alerting, issuing Graphite threshold checks via source-controlled scripts. While this worked for a while, expanding the Carbon cluster required a manual resharding process and, due to lack of replication, any single node’s disk failure caused permanent loss of its associated metrics. In short, this solution was not able to meet our needs as the company continued to grow.
To ensure the scalability of Uber’s metrics backend, we decided to build out a system that provided fault tolerant metrics ingestion, storage, and querying as a managed platform...
(GitHub : https://github.com/m3db/m3)
Data science and engineering teams at Lyft maintain several big data pipelines that serve as the foundation for various types of analysis throughout the business.
Apache Airflow sits at the center of this big data infrastructure, allowing users to “programmatically author, schedule, and monitor data pipelines.” Airflow is an open source tool, and “Lyft is the very first Airflow adopter in production since the project was open sourced around three years ago.”
There are several key components of the architecture. A web UI allows users to view the status of their queries, along with an audit trail of any modifications the query. A metadata database stores things like job status and task instance status. A multi-process scheduler handles job requests, and triggers the executor to execute those tasks.
Airflow supports several executors, though Lyft uses CeleryExecutor to scale task execution in production. Airflow is deployed to three Amazon Auto Scaling Groups, with each associated with a celery queue.
Audit logs supplied to the web UI are powered by the existing Airflow audit logs as well as Flask signal.
Datadog, Statsd, Grafana, and PagerDuty are all used to monitor the Airflow system.
I use Kibana because it ships with the ELK stack. I don't find it as powerful as Splunk however it is light years above grepping through log files. We previously used Grafana but found it to be annoying to maintain a separate tool outside of the ELK stack. We were able to get everything we needed from Kibana.
I use both Kibana and Grafana on my workplace: Kibana for logging and Grafana for monitoring. Since you already work with Elasticsearch, I think Kibana is the safest choice in terms of ease of use and variety of messages it can manage, while Grafana has still (in my opinion) a strong link to metrics
For our Predictive Analytics platform, we have used both Grafana and Kibana
- Grafana based demo video: https://www.youtube.com/watch?v=tdTB2AcU4Sg
- Kibana based reporting screenshot: https://imgur.com/vuVvZKN
predictions and ML algorithms support, so if you need them, you may be better off with Kibana . The multi-variate analysis features it provide are very unique (not available in Grafana).
For everything else, definitely Grafana . Especially the number of supported data sources, and plugins clearly makes Grafana a winner (in just visualization and reporting sense). Creating your own plugin is also very easy. The top pros of Grafana (which it does better than Kibana ) are:
- Creating and organizing visualization panels
- Templating the panels on dashboards for repetetive tasks
- Realtime monitoring, filtering of charts based on conditions and variables
- Export / Import in JSON format (that allows you to version and save your dashboard as part of git)
A huge part of our continuous deployment practices is to have granular alerting and monitoring across the platform. To do this, we run Sentry on-premise, inside our VPCs, for our event alerting, and we run an awesome observability and monitoring system consisting of StatsD, Graphite and Grafana. We have dashboards using this system to monitor our core subsystems so that we can know the health of any given subsystem at any moment. This system ties into our PagerDuty rotation, as well as alerts from some of our Amazon CloudWatch alarms (we’re looking to migrate all of these to our internal monitoring system soon).
After looking for a way to monitor or at least get a better overview of our infrastructure, we found out that Grafana (which I previously only used in ELK stacks) has a plugin available to fully integrate with Amazon CloudWatch . Which makes it way better for our use-case than the offer of the different competitors (most of them are even paid). There is also a CloudFlare plugin available, the platform we use to serve our DNS requests. Although we are a big fan of https://smashing.github.io/ (previously dashing), for now we are starting with Grafana .
analyze heap dump and many logging or traces
If you have a single server, checking log files is as easy as SSHing to it and viewing logs. When you move to the container world, with many servers, you need a place to aggregate and search through all of your logs. CloudWatch provides us with this and it was trivial to setup.
We use Grafana to view live stats relating to our servers such as memory and CPU usage. We also use Grafana to monitor our gaming servers for data such as latency and player counts. This allows us to generate effective analytics and see when problems arise.
Everyone likes graphs, right?! This isn't a tool we actively use right now, but paired with Prometheus we want to use it to have visual monitors on things like API cluster health, status, queue stats, DB/redis query and cache stats etc.
Grafana is used in combination with Prometheus to display the gathered stats and to monitor our physical servers aswell as their virtual applications. We also use Grafana to get notifications about irregularities.
Grafana takes the data from InfluxDB and presents it in a nice flexible format. Bonus points for built-in alerts and playlists (cycles through different dashboards automatically)
- Graph report with many panels and Dashboard.
- Easy to deploy, and view performance of system.
- Intergrating with many datasource: Prometheus, CloudWatch
CloudWatch is “on by default” in Amazon. And by just configuring a few alarms you can have a near-zero-cost monitoring service of your own.
- Collect metrics for Grafana.
- Alerts for AutoScale.
- Centralized-logging: rds, ec2, app logs with CloudWatch Log
CloudWatch is used to monitor various aspects of our production infrastructure deployed at Amazon.