Need advice about which tool to choose?Ask the StackShare community!
Prometheus vs Splunk Cloud: What are the differences?
Introduction
Prometheus and Splunk Cloud are two popular tools used for monitoring and analytics in the IT industry. While they serve similar purposes, there are some key differences that set them apart.
Deployment Model: Prometheus is an open-source solution that is typically self-hosted and deployed on-premises or in public or private cloud environments. It provides users with high flexibility and control over their monitoring infrastructure. On the other hand, Splunk Cloud is a fully managed Software-as-a-Service (SaaS) offering. It is hosted and maintained by Splunk itself, relieving users of the responsibility of managing infrastructure and enabling quick setup and deployment.
Licensing: Prometheus is distributed under the open-source license, which means it is free to use and customize. This provides users with the freedom to modify and extend the tool according to their specific requirements. Splunk Cloud, however, is a commercial product and requires a paid license. The licensing cost is based on various factors such as data volume, user count, and additional features.
Data Collection: Prometheus follows a pull-based model for data collection, where it periodically scrapes metrics and data from the targets it monitors. It supports multiple ingestion methods such as HTTP, SNMP, and JMX. Splunk Cloud, on the other hand, supports both pull-based and push-based data collection. It can ingest data from various sources through agents, APIs, syslog, and other protocols.
Querying and Alerting: Prometheus uses a specialized query language called PromQL for data retrieval and analysis. It allows users to perform advanced queries and aggregations on the collected metrics. Prometheus also provides a built-in alerting mechanism that can trigger alerts based on defined rules. Splunk Cloud, on the other hand, offers a powerful search language called SPL (Splunk Processing Language). It provides a wide range of functions and capabilities for searching, analyzing, and visualizing data. Splunk Cloud also offers advanced alerting and monitoring features with real-time alerts, anomaly detection, and predictive analytics.
Scalability: Prometheus is known for its ability to scale horizontally, allowing users to add more instances and distribute the workload across them. It has a federation feature that enables data aggregation from multiple Prometheus instances. Splunk Cloud, being a managed service, offers scalability as part of its infrastructure. It can handle large amounts of data and scale resources as needed without requiring user intervention.
Ecosystem and Integration: Prometheus has a thriving open-source community and a rich ecosystem of exporters, plugins, and integrations. It integrates well with other tools and platforms such as Grafana for visualization and Kubernetes for container orchestration. Splunk Cloud also supports a wide range of integrations with various technologies and systems such as AWS, Azure, and Docker. It offers a marketplace of apps and add-ons to extend its functionality and integrate with third-party tools.
In summary, Prometheus offers a self-hosted, open-source monitoring solution with flexible deployment options and a wide range of integrations. Splunk Cloud, on the other hand, is a managed SaaS offering that provides quick deployment, advanced analytics capabilities, and seamless scalability. The choice between the two depends on the specific needs and preferences of the organization.
Looking for a tool which can be used for mainly dashboard purposes, but here are the main requirements:
- Must be able to get custom data from AS400,
- Able to display automation test results,
- System monitoring / Nginx API,
- Able to get data from 3rd parties DB.
Grafana is almost solving all the problems, except AS400 and no database to get automation test results.
You can look out for Prometheus Instrumentation (https://prometheus.io/docs/practices/instrumentation/) Client Library available in various languages https://prometheus.io/docs/instrumenting/clientlibs/ to create the custom metric you need for AS4000 and then Grafana can query the newly instrumented metric to show on the dashboard.
We would like to detect unusual config changes that can potentially cause production outage.
Such as, SecurityGroup new allow/deny rule, AuthZ policy change, Secret key/certificate rotation, IP subnet add/drop. The problem is the source of all of these activities is different, i.e., AWS IAM, Amazon EC2, internal prod services, envoy sidecar, etc.
Which of the technology would be best suitable to detect only IMP events (not all activity) from various sources all workload running on AWS and also Splunk Cloud?
For continuous monitoring and detecting unusual configuration changes, I would suggest you look into AWS Config.
AWS Config enables you to assess, audit, and evaluate the configurations of your AWS resources. Config continuously monitors and records your AWS resource configurations and allows you to automate the evaluation of recorded configurations against desired configurations. Here is a list of supported AWS resources types and resource relationships with AWS Config https://docs.aws.amazon.com/config/latest/developerguide/resource-config-reference.html
Also as of Nov, 2019 - AWS Config launches support for third-party resources. You can now publish the configuration of third-party resources, such as GitHub repositories, Microsoft Active Directory resources, or any on-premises server into AWS Config using the new API. Here is more detail: https://docs.aws.amazon.com/config/latest/developerguide/customresources.html
If you have multiple AWS Account in your organization and want to detect changes there: https://docs.aws.amazon.com/config/latest/developerguide/aggregate-data.html
Lastly, if you already use Splunk Cloud in your enterprise and are looking for a consolidated view then, AWS Config is supported by Splunk Cloud as per their documentation too. https://aws.amazon.com/marketplace/pp/Splunk-Inc-Splunk-Cloud/B06XK299KV https://aws.amazon.com/marketplace/pp/Splunk-Inc-Splunk-Cloud/B06XK299KV
While it won't detect events as they happen a good stop gap would be to define your infrastructure config using terraform. You can then periodically run the terraform config against your environment and alert if there are any changes.
Consider using a combination of Netflix Security Monkey and AWS Guard Duty.
You can achieve automated detection and alerting, as well as automated recovery based on policies with these tools.
For instance, you could detect SecurityGroup rule changes that allow unrestricted egress from EC2 instances and then revert those changes automatically.
It's unclear from your post whether you want to detect events within the Splunk Cloud infrastructure or if you want to detect events indicated in data going to Splunk using the Splunk capabilities. If the latter, then Splunk has extremely rich capabilities in their query language and integrated alerting functions. With Splunk you can also run arbitrary Python scripts in response to certain events, so what you can't analyze and alert on with native functionality or plugins, you could write code to achieve.
Well there are clear advantages of using either tools, it all boils down to what exactly are you trying to achieve with this i.e do you want to proactive monitoring or do you want debug an incident/issue. Splunk definitely is superior in terms of proactively monitoring your logs for unusal events, but getting the cloudtrail logs across to splunk would require some not so straight forward setup (Splunk has a blueprint for this setup which uses AWS kinesis/Firehose). Cloudtrail on the other had is available out of the box from AWS, the setup is quite simple and straight forward. But analysing the log could require you setup Glue crawlers and you might have to use AWS Athena to run SQL Like query.
Refer: https://docs.aws.amazon.com/athena/latest/ug/cloudtrail-logs.html
In my personal experience the cost/effort involved in setting up splunk is not worth it for smaller workloads, whereas the AWS Cloudtrail/Glue/Athena would be less expensive setup(comparatively).
Alternatively you could look at something like sumologic, which has better integration with cloudtrail as opposed to splunk. Hope that helps.
I'd recommend using CloudTrail, it helped me a lot. But depending on your situation I'd recommed building a custom solution(like aws amazon-ssm-agent) which on configuration change makes an API call and logs them in grafana or kibana.
Hi, We have a situation, where we are using Prometheus to get system metrics from PCF (Pivotal Cloud Foundry) platform. We send that as time-series data to Cortex via a Prometheus server and built a dashboard using Grafana. There is another pipeline where we need to read metrics from a Linux server using Metricbeat, CPU, memory, and Disk. That will be sent to Elasticsearch and Grafana will pull and show the data in a dashboard.
Is it OK to use Metricbeat for Linux server or can we use Prometheus?
What is the difference in system metrics sent by Metricbeat and Prometheus node exporters?
Regards, Sunil.
If you're already using Prometheus for your system metrics, then it seems like standing up Elasticsearch just for Linux host monitoring is excessive. The node_exporter is probably sufficient if you'e looking for standard system metrics.
Another thing to consider is that Metricbeat / ELK use a push model for metrics delivery, whereas Prometheus pulls metrics from each node it is monitoring. Depending on how you manage your network security, opting for one solution over two may make things simpler.
Hi Sunil! Unfortunately, I don´t have much experience with Metricbeat so I can´t advise on the diffs with Prometheus...for Linux server, I encourage you to use Prometheus node exporter and for PCF, I would recommend using the instana tile (https://www.instana.com/supported-technologies/pivotal-cloud-foundry/). Let me know if you have further questions! Regards Jose
We're looking for a Monitoring and Logging tool. It has to support AWS (mostly 100% serverless, Lambdas, SNS, SQS, API GW, CloudFront, Autora, etc.), as well as Azure and GCP (for now mostly used as pure IaaS, with a lot of cognitive services, and mostly managed DB). Hopefully, something not as expensive as Datadog or New relic, as our SRE team could support the tool inhouse. At the moment, we primarily use CloudWatch for AWS and Pandora for most on-prem.
this is quite affordable and provides what you seem to be looking for. you can see a whole thing about the APM space here https://www.apmexperts.com/observability/ranking-the-observability-offerings/
I worked with Datadog at least one year and my position is that commercial tools like Datadog are the best option to consolidate and analyze your metrics. Obviously, if you can't pay the tool, the best free options are the mix of Prometheus with their Alert Manager and Grafana to visualize (that are complementary not substitutable). But I think that no use a good tool it's finally more expensive that use a not really good implementation of free tools and you will pay also to maintain its.
Pros of Prometheus
- Powerful easy to use monitoring47
- Flexible query language38
- Dimensional data model32
- Alerts27
- Active and responsive community23
- Extensive integrations22
- Easy to setup19
- Beautiful Model and Query language12
- Easy to extend7
- Nice6
- Written in Go3
- Good for experimentation2
- Easy for monitoring1
Pros of Splunk Cloud
- More powerful & Integrates with on-prem & off-prem7
- Free3
- Powerful log analytics3
- Pci compliance1
- Production debugger1
Sign up to add or upvote prosMake informed product decisions
Cons of Prometheus
- Just for metrics12
- Bad UI6
- Needs monitoring to access metrics endpoints6
- Not easy to configure and use4
- Supports only active agents3
- Written in Go2
- TLS is quite difficult to understand2
- Requires multiple applications and tools2
- Single point of failure1