Need advice about which tool to choose?Ask the StackShare community!
Grafana vs Prometheus vs Splunk Cloud: What are the differences?
Introduction
This Markdown code provides a comparison between Grafana, Prometheus, and Splunk Cloud, highlighting their key differences.
Data Source Compatibility: Grafana supports multiple data sources, including Prometheus, InfluxDB, Elasticsearch, and more. Prometheus, on the other hand, is specifically built for monitoring and time-series data collection. Splunk Cloud is a cloud-based platform that can ingest data from various sources, such as log files, metrics, and events, providing a broader range of compatibility compared to Grafana and Prometheus.
Data Visualization Capabilities: Grafana excels in data visualization, offering a wide range of intuitive and customizable visualization options, including graphs, charts, tables, and dashboards. Prometheus, on the other hand, provides basic graphical representations but focuses more on monitoring and alerting capabilities. Splunk Cloud also offers robust visualization capabilities, enabling users to create reports, dashboards, and visualizations in real-time.
Alerting and Monitoring Features: Grafana provides powerful alerting capabilities, allowing users to set up alerts based on various conditions and thresholds. Prometheus, being specifically developed for monitoring, offers extensive alerting and monitoring features, including built-in alert rules, alertmanager, and advanced querying options. Splunk Cloud also provides alerting and monitoring features, enabling users to proactively monitor and manage their data, logs, and metrics.
Scalability and Performance: Grafana is known for its scalability, supporting high volumes of data and users. However, it heavily relies on the underlying time-series database, such as Prometheus or InfluxDB, for data storage and retrieval. Prometheus is designed to handle massive amounts of time-series data and is highly scalable, allowing it to collect data from thousands of nodes. Splunk Cloud offers auto-scaling capabilities, enabling users to handle large workloads efficiently.
Ease of Use and Configuration: Grafana provides a user-friendly interface with drag-and-drop functionality, making it easy to create and configure visualizations and dashboards. Prometheus, although powerful, has a steeper learning curve as it requires defining and configuring exporters, jobs, and alerting rules. Splunk Cloud offers a comprehensive user interface that simplifies data management, search, and analytics, making it more user-friendly compared to Prometheus.
Cost and Deployment Options: Grafana is open-source and free to use, making it an attractive option for small to mid-sized organizations. Prometheus is also open-source and free, but its scalability and deployment options may require additional resources. Splunk Cloud is a commercial solution with pricing based on usage, offering flexible deployment options including cloud-based, on-premises, and hybrid deployments, potentially incurring higher costs.
In Summary, Grafana is a feature-rich data visualization tool with compatibility to various data sources, Prometheus is specialized for monitoring and alerting with robust scalability, while Splunk Cloud is a comprehensive platform offering broader data compatibility and ease of use.
Looking for a tool which can be used for mainly dashboard purposes, but here are the main requirements:
- Must be able to get custom data from AS400,
- Able to display automation test results,
- System monitoring / Nginx API,
- Able to get data from 3rd parties DB.
Grafana is almost solving all the problems, except AS400 and no database to get automation test results.
You can look out for Prometheus Instrumentation (https://prometheus.io/docs/practices/instrumentation/) Client Library available in various languages https://prometheus.io/docs/instrumenting/clientlibs/ to create the custom metric you need for AS4000 and then Grafana can query the newly instrumented metric to show on the dashboard.
We would like to detect unusual config changes that can potentially cause production outage.
Such as, SecurityGroup new allow/deny rule, AuthZ policy change, Secret key/certificate rotation, IP subnet add/drop. The problem is the source of all of these activities is different, i.e., AWS IAM, Amazon EC2, internal prod services, envoy sidecar, etc.
Which of the technology would be best suitable to detect only IMP events (not all activity) from various sources all workload running on AWS and also Splunk Cloud?
For continuous monitoring and detecting unusual configuration changes, I would suggest you look into AWS Config.
AWS Config enables you to assess, audit, and evaluate the configurations of your AWS resources. Config continuously monitors and records your AWS resource configurations and allows you to automate the evaluation of recorded configurations against desired configurations. Here is a list of supported AWS resources types and resource relationships with AWS Config https://docs.aws.amazon.com/config/latest/developerguide/resource-config-reference.html
Also as of Nov, 2019 - AWS Config launches support for third-party resources. You can now publish the configuration of third-party resources, such as GitHub repositories, Microsoft Active Directory resources, or any on-premises server into AWS Config using the new API. Here is more detail: https://docs.aws.amazon.com/config/latest/developerguide/customresources.html
If you have multiple AWS Account in your organization and want to detect changes there: https://docs.aws.amazon.com/config/latest/developerguide/aggregate-data.html
Lastly, if you already use Splunk Cloud in your enterprise and are looking for a consolidated view then, AWS Config is supported by Splunk Cloud as per their documentation too. https://aws.amazon.com/marketplace/pp/Splunk-Inc-Splunk-Cloud/B06XK299KV https://aws.amazon.com/marketplace/pp/Splunk-Inc-Splunk-Cloud/B06XK299KV
While it won't detect events as they happen a good stop gap would be to define your infrastructure config using terraform. You can then periodically run the terraform config against your environment and alert if there are any changes.
Consider using a combination of Netflix Security Monkey and AWS Guard Duty.
You can achieve automated detection and alerting, as well as automated recovery based on policies with these tools.
For instance, you could detect SecurityGroup rule changes that allow unrestricted egress from EC2 instances and then revert those changes automatically.
It's unclear from your post whether you want to detect events within the Splunk Cloud infrastructure or if you want to detect events indicated in data going to Splunk using the Splunk capabilities. If the latter, then Splunk has extremely rich capabilities in their query language and integrated alerting functions. With Splunk you can also run arbitrary Python scripts in response to certain events, so what you can't analyze and alert on with native functionality or plugins, you could write code to achieve.
Well there are clear advantages of using either tools, it all boils down to what exactly are you trying to achieve with this i.e do you want to proactive monitoring or do you want debug an incident/issue. Splunk definitely is superior in terms of proactively monitoring your logs for unusal events, but getting the cloudtrail logs across to splunk would require some not so straight forward setup (Splunk has a blueprint for this setup which uses AWS kinesis/Firehose). Cloudtrail on the other had is available out of the box from AWS, the setup is quite simple and straight forward. But analysing the log could require you setup Glue crawlers and you might have to use AWS Athena to run SQL Like query.
Refer: https://docs.aws.amazon.com/athena/latest/ug/cloudtrail-logs.html
In my personal experience the cost/effort involved in setting up splunk is not worth it for smaller workloads, whereas the AWS Cloudtrail/Glue/Athena would be less expensive setup(comparatively).
Alternatively you could look at something like sumologic, which has better integration with cloudtrail as opposed to splunk. Hope that helps.
I'd recommend using CloudTrail, it helped me a lot. But depending on your situation I'd recommed building a custom solution(like aws amazon-ssm-agent) which on configuration change makes an API call and logs them in grafana or kibana.
Hi, We have a situation, where we are using Prometheus to get system metrics from PCF (Pivotal Cloud Foundry) platform. We send that as time-series data to Cortex via a Prometheus server and built a dashboard using Grafana. There is another pipeline where we need to read metrics from a Linux server using Metricbeat, CPU, memory, and Disk. That will be sent to Elasticsearch and Grafana will pull and show the data in a dashboard.
Is it OK to use Metricbeat for Linux server or can we use Prometheus?
What is the difference in system metrics sent by Metricbeat and Prometheus node exporters?
Regards, Sunil.
If you're already using Prometheus for your system metrics, then it seems like standing up Elasticsearch just for Linux host monitoring is excessive. The node_exporter is probably sufficient if you'e looking for standard system metrics.
Another thing to consider is that Metricbeat / ELK use a push model for metrics delivery, whereas Prometheus pulls metrics from each node it is monitoring. Depending on how you manage your network security, opting for one solution over two may make things simpler.
Hi Sunil! Unfortunately, I don´t have much experience with Metricbeat so I can´t advise on the diffs with Prometheus...for Linux server, I encourage you to use Prometheus node exporter and for PCF, I would recommend using the instana tile (https://www.instana.com/supported-technologies/pivotal-cloud-foundry/). Let me know if you have further questions! Regards Jose
We're looking for a Monitoring and Logging tool. It has to support AWS (mostly 100% serverless, Lambdas, SNS, SQS, API GW, CloudFront, Autora, etc.), as well as Azure and GCP (for now mostly used as pure IaaS, with a lot of cognitive services, and mostly managed DB). Hopefully, something not as expensive as Datadog or New relic, as our SRE team could support the tool inhouse. At the moment, we primarily use CloudWatch for AWS and Pandora for most on-prem.
this is quite affordable and provides what you seem to be looking for. you can see a whole thing about the APM space here https://www.apmexperts.com/observability/ranking-the-observability-offerings/
I worked with Datadog at least one year and my position is that commercial tools like Datadog are the best option to consolidate and analyze your metrics. Obviously, if you can't pay the tool, the best free options are the mix of Prometheus with their Alert Manager and Grafana to visualize (that are complementary not substitutable). But I think that no use a good tool it's finally more expensive that use a not really good implementation of free tools and you will pay also to maintain its.
From a StackShare Community member: “We need better analytics & insights into our Elasticsearch cluster. Grafana, which ships with advanced support for Elasticsearch, looks great but isn’t officially supported/endorsed by Elastic. Kibana, on the other hand, is made and supported by Elastic. I’m wondering what people suggest in this situation."
For our Predictive Analytics platform, we have used both Grafana and Kibana
- Grafana based demo video: https://www.youtube.com/watch?v=tdTB2AcU4Sg
- Kibana based reporting screenshot: https://imgur.com/vuVvZKN
Kibana has predictions
and ML algorithms support, so if you need them, you may be better off with Kibana . The multi-variate analysis features it provide are very unique (not available in Grafana).
For everything else, definitely Grafana . Especially the number of supported data sources, and plugins clearly makes Grafana a winner (in just visualization and reporting sense). Creating your own plugin is also very easy. The top pros of Grafana (which it does better than Kibana ) are:
- Creating and organizing visualization panels
- Templating the panels on dashboards for repetetive tasks
- Realtime monitoring, filtering of charts based on conditions and variables
- Export / Import in JSON format (that allows you to version and save your dashboard as part of git)
I use both Kibana and Grafana on my workplace: Kibana for logging and Grafana for monitoring. Since you already work with Elasticsearch, I think Kibana is the safest choice in terms of ease of use and variety of messages it can manage, while Grafana has still (in my opinion) a strong link to metrics
After looking for a way to monitor or at least get a better overview of our infrastructure, we found out that Grafana (which I previously only used in ELK stacks) has a plugin available to fully integrate with Amazon CloudWatch . Which makes it way better for our use-case than the offer of the different competitors (most of them are even paid). There is also a CloudFlare plugin available, the platform we use to serve our DNS requests. Although we are a big fan of https://smashing.github.io/ (previously dashing), for now we are starting with Grafana .
I use Kibana because it ships with the ELK stack. I don't find it as powerful as Splunk however it is light years above grepping through log files. We previously used Grafana but found it to be annoying to maintain a separate tool outside of the ELK stack. We were able to get everything we needed from Kibana.
Kibana should be sufficient in this architecture for decent analytics, if stronger metrics is needed then combine with Grafana. Datadog also offers nice overview but there's no need for it in this case unless you need more monitoring and alerting (and more technicalities).
@Kibana, of course, because @Grafana looks like amateur sort of solution, crammed with query builder grouping aggregates, but in essence, as recommended by CERN - KIbana is the corporate (startup vectored) decision.
Furthermore, @Kibana comes with complexity adhering ELK stack, whereas @InfluxDB + @Grafana & co. recently have become sophisticated development conglomerate instead of advancing towards a understandable installation step by step inheritance.
Pros of Grafana
- Beautiful89
- Graphs are interactive68
- Free57
- Easy56
- Nicer than the Graphite web interface34
- Many integrations26
- Can build dashboards18
- Easy to specify time window10
- Can collaborate on dashboards10
- Dashboards contain number tiles9
- Open Source5
- Integration with InfluxDB5
- Click and drag to zoom in5
- Authentification and users management4
- Threshold limits in graphs4
- Alerts3
- It is open to cloud watch and many database3
- Simple and native support to Prometheus3
- Great community support2
- You can use this for development to check memcache2
- You can visualize real time data to put alerts2
- Grapsh as code0
- Plugin visualizationa0
Pros of Prometheus
- Powerful easy to use monitoring47
- Flexible query language38
- Dimensional data model32
- Alerts27
- Active and responsive community23
- Extensive integrations22
- Easy to setup19
- Beautiful Model and Query language12
- Easy to extend7
- Nice6
- Written in Go3
- Good for experimentation2
- Easy for monitoring1
Pros of Splunk Cloud
- More powerful & Integrates with on-prem & off-prem7
- Free3
- Powerful log analytics3
- Pci compliance1
- Production debugger1
Sign up to add or upvote prosMake informed product decisions
Cons of Grafana
- No interactive query builder1
Cons of Prometheus
- Just for metrics12
- Bad UI6
- Needs monitoring to access metrics endpoints6
- Not easy to configure and use4
- Supports only active agents3
- Written in Go2
- TLS is quite difficult to understand2
- Requires multiple applications and tools2
- Single point of failure1