Need advice about which tool to choose?Ask the StackShare community!
Datadog vs Prometheus vs Zabbix: What are the differences?
Scalability: Datadog is designed to scale easily with a large number of hosts, making it suitable for enterprises with complex infrastructures. Prometheus is more suitable for smaller setups as it requires manual sharding for scaling. Zabbix, on the other hand, is known for its ability to handle a large number of monitored devices efficiently.
Data Storage: Datadog offers a cloud-based solution for data storage, making it easy to scale and access data from anywhere. Prometheus stores data locally, which can be a limitation for large-scale setups that require centralized data access. Zabbix allows users to choose between local and cloud storage options, providing flexibility based on specific requirements.
Alerting Capabilities: Datadog provides robust alerting features with customizable thresholds and notification mechanisms. Prometheus has basic alerting capabilities that require additional configurations for advanced setups. Zabbix offers extensive alerting functionality with support for escalations, multiple notification methods, and acknowledgment mechanisms.
Monitoring Capabilities: Datadog offers a wide range of integrations and plugins for monitoring various technologies and services, making it a versatile solution. Prometheus focuses on monitoring cloud-native environments and is highly suited for Kubernetes ecosystem monitoring. Zabbix, on the other hand, is known for its comprehensive monitoring capabilities for IT infrastructure components.
Ease of Use: Datadog is known for its user-friendly interface and easy setup process, making it ideal for organizations looking for a quick deployment. Prometheus requires more configuration and expertise to set up, making it better suited for technically skilled teams. Zabbix provides a balance between ease of use and advanced monitoring capabilities, catering to both beginners and experienced users.
Community Support: Datadog has a strong community support system with extensive documentation and resources available for users. Prometheus has an active open-source community backing it up, providing continuous updates and new features. Zabbix boasts a dedicated community forum and extensive knowledge base, offering support and guidance for users at all levels of expertise.
In Summary, Datadog, Prometheus, and Zabbix each offer unique strengths and cater to different monitoring needs based on factors such as scalability, data storage, alerting capabilities, monitoring focus, user-friendliness, and community support.
Looking for a tool which can be used for mainly dashboard purposes, but here are the main requirements:
- Must be able to get custom data from AS400,
- Able to display automation test results,
- System monitoring / Nginx API,
- Able to get data from 3rd parties DB.
Grafana is almost solving all the problems, except AS400 and no database to get automation test results.
You can look out for Prometheus Instrumentation (https://prometheus.io/docs/practices/instrumentation/) Client Library available in various languages https://prometheus.io/docs/instrumenting/clientlibs/ to create the custom metric you need for AS4000 and then Grafana can query the newly instrumented metric to show on the dashboard.
Hey there! We are looking at Datadog, Dynatrace, AppDynamics, and New Relic as options for our web application monitoring.
Current Environment: .NET Core Web app hosted on Microsoft IIS
Future Environment: Web app will be hosted on Microsoft Azure
Tech Stacks: IIS, RabbitMQ, Redis, Microsoft SQL Server
Requirement: Infra Monitoring, APM, Real - User Monitoring (User activity monitoring i.e., time spent on a page, most active page, etc.), Service Tracing, Root Cause Analysis, and Centralized Log Management.
Please advise on the above. Thanks!
We are looking for a centralised monitoring solution for our application deployed on Amazon EKS. We would like to monitor using metrics from Kubernetes, AWS services (NeptuneDB, AWS Elastic Load Balancing (ELB), Amazon EBS, Amazon S3, etc) and application microservice's custom metrics.
We are expected to use around 80 microservices (not replicas). I think a total of 200-250 microservices will be there in the system with 10-12 slave nodes.
We tried Prometheus but it looks like maintenance is a big issue. We need to manage scaling, maintaining the storage, and dealing with multiple exporters and Grafana. I felt this itself needs few dedicated resources (at least 2-3 people) to manage. Not sure if I am thinking in the correct direction. Please confirm.
You mentioned Datadog and Sysdig charges per host. Does it charge per slave node?
Can't say anything to Sysdig. I clearly prefer Datadog as
- they provide plenty of easy to "switch-on" plugins for various technologies (incl. most of AWS)
- easy to code (python) agent plugins / api for own metrics
- brillant dashboarding / alarms with many customization options
- pricing is OK, there are cheaper options for specific use cases but if you want superior dashboarding / alarms I haven't seen a good competitor (despite your own Prometheus / Grafana / Kibana dog food)
IMHO NewRelic is "promising since years" ;) good ideas but bad integration between their products. Their Dashboard query language is really nice but lacks critical functions like multiple data sets or advanced calculations. Needless to say you get all of that with Datadog.
Need help setting up a monitoring / logging / alarm infrastructure? Send me a message!
Hi Medeti,
you are right. Building based on your stack something with open source is heavy lifting. A lot of people I know start with such a set-up, but quickly run into frustration as they need to dedicated their best people to build a monitoring which is doing the job in a professional way.
As you are microservice focussed and are looking for 'low implementation and maintenance effort', you might want to have a look at INSTANA, which was built with modern tool stacks in mind. https://www.instana.com/apm-for-microservices/
We have a public sand-box available if you just want to have a look at the product once and of course also a free-trial: https://www.instana.com/getting-started-with-apm/
Let me know if you need anything on top.
I have hands on production experience both with New Relic and Datadog. I personally prefer Datadog over NewRelic because of the UI, the Documentation and the overall user/developer experience.
NewRelic however, can do basically the same things as Datadog can, and some of the features like alerting have been present in NewRelic for longer than in Datadog. The cool thing about NewRelic is their last-summer-updated pricing: you no longer pay per host but after data you send towards New Relic. This can be a huge cost saver depending on your particular setup
I'd go for Datadog, but given you have lots of containers I would also make a cost calculation. If the price difference is significant and there's a budget constraint NewRelic might be the better choice.
Hi, We have a situation, where we are using Prometheus to get system metrics from PCF (Pivotal Cloud Foundry) platform. We send that as time-series data to Cortex via a Prometheus server and built a dashboard using Grafana. There is another pipeline where we need to read metrics from a Linux server using Metricbeat, CPU, memory, and Disk. That will be sent to Elasticsearch and Grafana will pull and show the data in a dashboard.
Is it OK to use Metricbeat for Linux server or can we use Prometheus?
What is the difference in system metrics sent by Metricbeat and Prometheus node exporters?
Regards, Sunil.
If you're already using Prometheus for your system metrics, then it seems like standing up Elasticsearch just for Linux host monitoring is excessive. The node_exporter is probably sufficient if you'e looking for standard system metrics.
Another thing to consider is that Metricbeat / ELK use a push model for metrics delivery, whereas Prometheus pulls metrics from each node it is monitoring. Depending on how you manage your network security, opting for one solution over two may make things simpler.
Hi Sunil! Unfortunately, I don´t have much experience with Metricbeat so I can´t advise on the diffs with Prometheus...for Linux server, I encourage you to use Prometheus node exporter and for PCF, I would recommend using the instana tile (https://www.instana.com/supported-technologies/pivotal-cloud-foundry/). Let me know if you have further questions! Regards Jose
My team is divided on using Centreon or Zabbix for enterprise monitoring and alert automation. Can someone let us know which one is better? There is one more tool called Datadog that we are using for cloud assets. Of course, Datadog presents us with huge bills. So we want to have a comparative study. Suggestions and advice are welcome. Thanks!
I work at Volvo Car Corporation as a consultant Project Manager. We have deployed Zabbix in all of our factories for factory monitoring because after thorough investigation we saw that Zabbix supports the wide variety of Operating Systems, hardware peripherals and devices a Car Manufacturer has.
No other tool had the same amount of support onboard for our production environment and we didn't want to end up using a different tool again for several areas. That is the major strong point about Zabbix and it's free of course. Another strong point is the documentation which is widely available; Zabbix Youtube channel with tutorial video's, Zabbix share which holds free templates, the Zabbix online documentation and the Zabbix forum also helped us out quite a bit. Deployment is quite easy since it uses templates, so almost all configuration can be done on server side.
To conclude, we are really pleased with the tool so far, it helped us detect several causes of issues that were a pain to solve in the past.
Centreon is part of the Nagios ecosystem, meaning there is a huge number of resources you may find around in the community (plugins, skills, addons). Zabbix monitoring paradigms are totally different from Centreon. Centreon plugins have some kind of intelligence when they are launched, where Zabbix monitoring rules are configured centrally with the raw data collected. Testing both will help you understand :) Users used to say Centreon may be faster for setup and deployment. And in the end, both are full of monitoring features. Centreon has out of the box a full catalog of probes from cloud to the edge https://www.centreon.com/en/plugins-pack-list/ As soon as you have defined your monitoring policies and template, you can deploy it fast through command line API or REST API. Centreon plays well in the ITSM, Automation, AIOps spaces with many connectors for Prometheus, ServiceNow, GLPI, Ansible, Chef, Splunk, ... The polling server mode is one of the differentiators with Centreon. You set up remote server(s) and chose btw multiple information-exchange mechanisms. Powerful and resilient for remote, VPN, DMZ, satellite networks. Centreon is a good value for price to do a data collection (availability, performance, fault) on a wide range of technologies (physical, legacy, cloud). There are pro support and enterprise version with dashboards and reporting. IT Central Station gathers many user feedback you can rely on both Centreon & Zabbix https://www.itcentralstation.com/products/centreon-reviews
We highly recommend Zabbix. We have used it to build our own monitoring product (available on cloud -like datadog- or on premise with support) because of its flexibility and extendability. It can be easily integrated with the powerful dashboarding and data aggregation of Grafana, so it is perfect. All configuration is done via web and templates, so it scales well and can be distributed via proxies. I think there also more companies providing consultancy in Zabbix (like ours) than Centreon and community is much wider. Also Zabbix roadmap and focus (compatibility with Elasticsearch, Prometheus, TimescaleDB) is really really good.
Hi Vivek, what's your stack? If huge monitoring bills are your concern and if you’re using a number of JVM languages, or mostly Scala / Akka, and would like “one tool to monitor them all”, Kamon might be the friendliest choice to go for.
Kamon APM’s major benefit is it comes with a built-in dashboard for the most important metrics to monitor, taking the pain of figuring out what to monitor and building your own dashboards for weeks out of the monitoring.
So, I am working in a big company where they have multiple different microservices running that are written in Golang. I am currently searching for a technology that can give me all the metric data from the microservices. What time-series databases would you recommend? or which databases would you recommend to further investigate? I appreciate any input.
Each of these tools can help you with micro service workload and work well. I will try to go through some good, bad and ugly of each.
Datadog has an easy setup and time to get something tangible out of it. The cost model is by host so this is something to take into consideration how it will affect your use case. Also as a large organization at some point you will probably want control over some/all of your telemetry data to run your own ML or AI processes. With Datadog you this can be difficult as you will need to create processes outside of its closed eco system to get Raw metrics.
Prometheus is a great tool. It also has a fairly straight forward setup especially with Kubernetes. If you are running your micro services in k8s then this is going to get used one way or another; it is a first class citizen there with heavy utilization of K8s API. I also like the fact that Kubernetes architecture is easy to understand and that it utilizes Grafana for the visualization engine. Prometheus at scale can be done but it is a pain. Especially with a distributed infrastructure across multiple workloads.
Influxdb (TICK stack in v1) is known for its scalability and flexibility as a time series database. Telegraf is the main input/data-forwarder of the architecture and is completely decoupled from the database as are the other 3 components of the stack. Influx has made it very easy to just use one component on its own. I have worked on stacks that just used telegraf for ingestion into Kinesis or another data stream. I have also worked on stacks that used Influx database but used a different ETL process for analyzing the data in realtime instead of using their v1 architectures Kapacitor query engine. Influx database is a great performing time series database that in version 2 runs within kubernetes and utilizes Flux as the query language. Flux is a nice query language that is fairly easy to learn and has a lot of flexibility. As a last positive note Telegraf is written in Go so that would fit well with your current team.
The difficulties of Influx are that it is hard to get something really tangible out of it. Initial time to see something is fast but all the other work involved is a lot. You also have to understand the architecture well. The management of Influx can be cumbersome but it can scale up better than the other two when Datadogs cost is taken into consideration. They have a lot of API hooks in their V1 enterprise edition to wire and configure it. They do offer a mange service to offload this cost until later.
My overall choice here is probably to go with some of the influx as you can rip and/or add components as needed into the flow. Eventually you will probably want to run an ML process within there (can be done within Kapacitor but of course can also use your cloud provider here too) and this gives you the flexibility to do it anywhere. I would still go through prometheus because you will most likely use it also, but it does have forwarders to Influxdb so still fits.
We're running Prometheus/Alertmanager/Grafana across our whole company for any monitoring and metrics requirement, from the infrastructure layer all the way up to Springboot endpoint services, the prometheus exporter / scraping approach works pretty well for us. It's really easy to setup and more importantly; to maintain it without much effort, all the Prometheus configs get automatically created through Terraform outputs and Ansible jobs. Combine it with Grafana and you're smiling.
We're moving towards Prometheus from Datadog at this moment. Main driving force is TOC at the moment.
Datadog is great until it becomes too expensive.
We're looking for a Monitoring and Logging tool. It has to support AWS (mostly 100% serverless, Lambdas, SNS, SQS, API GW, CloudFront, Autora, etc.), as well as Azure and GCP (for now mostly used as pure IaaS, with a lot of cognitive services, and mostly managed DB). Hopefully, something not as expensive as Datadog or New relic, as our SRE team could support the tool inhouse. At the moment, we primarily use CloudWatch for AWS and Pandora for most on-prem.
I worked with Datadog at least one year and my position is that commercial tools like Datadog are the best option to consolidate and analyze your metrics. Obviously, if you can't pay the tool, the best free options are the mix of Prometheus with their Alert Manager and Grafana to visualize (that are complementary not substitutable). But I think that no use a good tool it's finally more expensive that use a not really good implementation of free tools and you will pay also to maintain its.
this is quite affordable and provides what you seem to be looking for. you can see a whole thing about the APM space here https://www.apmexperts.com/observability/ranking-the-observability-offerings/
I haven't heard much about Datadog until about a year ago. Ironically, the NewRelic sales person who I had a series of trainings with was trash talking about Datadog a lot. That drew my attention to Datadog and I gave it a try at another client project where we needed log handling, dashboards and alerting.
In 2019, Datadog was already offering log management and from that perspective, it was ahead of NewRelic. Other than that, from my perspective, the two tools are offering a very-very similar set of tools. Therefore I wouldn't say there's a significant difference between the two, the decision is likely a matter of taste. The pricing is also very similar.
The reasons why we chose Datadog over NewRelic were:
- The presence of log handling feature (since then, logging is GA at NewRelic as well since falls 2019).
- The setup was easier even though I already had experience with NewRelic, including participation in NewRelic trainings.
- The UI of Datadog is more compact and my experience is smoother.
- The NewRelic UI is very fragmented and New Relic One is just increasing this experience for me.
- The log feature of Datadog is very well designed, I find very useful the tagging logs with services. The log filtering is also very awesome.
Bottom line is that both tools are great and it makes sense to discover both and making the decision based on your use case. In our case, Datadog was the clear winner due to its UI, ease of setup and the awesome logging and alerting features.
I chose Datadog APM because the much better APM insights it provides (flamegraph, percentiles by default).
The drawbacks of this decision are we had to move our production monitoring to TimescaleDB + Telegraf instead of NR Insight
NewRelic is definitely easier when starting out. Agent is only a lib and doesn't require a daemon
Pros of Datadog
- Monitoring for many apps (databases, web servers, etc)140
- Easy setup107
- Powerful ui87
- Powerful integrations84
- Great value70
- Great visualization54
- Events + metrics = clarity46
- Notifications41
- Custom metrics41
- Flexibility39
- Free & paid plans19
- Great customer support16
- Makes my life easier15
- Adapts automatically as i scale up10
- Easy setup and plugins9
- Super easy and powerful8
- In-context collaboration7
- AWS support7
- Rich in features6
- Docker support5
- Cute logo4
- Simple, powerful, great for infra4
- Monitor almost everything4
- Full visibility of applications4
- Easy to Analyze4
- Cost4
- Source control and bug tracking4
- Best than others4
- Automation tools4
- Best in the field3
- Expensive3
- Good for Startups3
- Free setup3
- APM2
Pros of Prometheus
- Powerful easy to use monitoring47
- Flexible query language38
- Dimensional data model32
- Alerts27
- Active and responsive community23
- Extensive integrations22
- Easy to setup19
- Beautiful Model and Query language12
- Easy to extend7
- Nice6
- Written in Go3
- Good for experimentation2
- Easy for monitoring1
Pros of Zabbix
- Free21
- Alerts9
- Service/node/network discovery5
- Templates5
- Base metrics from the box4
- Multi-dashboards3
- SMS/Email/Messenger alerts3
- Grafana plugin available2
- Supports Graphs ans screens2
- Support proxies (for monitoring remote branches)2
- Perform website checking (response time, loading, ...)1
- API available for creating own apps1
- Templates free available (Zabbix Share)1
- Works with multiple databases1
- Advanced integrations1
- Supports multiple protocols/agents1
- Complete Logs Report1
- Open source1
- Supports large variety of Operating Systems1
- Supports JMX (Java, Tomcat, Jboss, ...)1
Sign up to add or upvote prosMake informed product decisions
Cons of Datadog
- Expensive20
- No errors exception tracking4
- External Network Goes Down You Wont Be Logging2
- Complicated1
Cons of Prometheus
- Just for metrics12
- Bad UI6
- Needs monitoring to access metrics endpoints6
- Not easy to configure and use4
- Supports only active agents3
- Written in Go2
- TLS is quite difficult to understand2
- Requires multiple applications and tools2
- Single point of failure1
Cons of Zabbix
- The UI is in PHP5
- Puppet module is sluggish2