Datadog vs PagerDuty: What are the differences?
Datadog: Unify logs, metrics, and traces from across your distributed infrastructure. Datadog is the leading service for cloud-scale monitoring. It is used by IT, operations, and development teams who build and operate applications that run on dynamic or hybrid cloud infrastructure. Start monitoring in minutes with Datadog!; PagerDuty: Incident management with powerful visibility, reliable alerting, and improved collaboration. PagerDuty is an alarm aggregation and dispatching service for system administrators and support teams. It collects alerts from your monitoring tools, gives you an overall view of all of your monitoring alarms, and alerts an on duty engineer if there's a problem.
Datadog belongs to "Performance Monitoring" category of the tech stack, while PagerDuty can be primarily classified under "Monitoring Aggregation".
Some of the features offered by Datadog are:
- 14-day Free Trial for an unlimited number of hosts
- 200+ turn-key integrations for data aggregation
- Clean graphs of StatsD and other integrations
On the other hand, PagerDuty provides the following key features:
- Alerting that works (and wakes you up)- When your systems go down, PagerDuty will wake you up. You choose how you want to be alerted - via phone, SMS or email, to multiple numbers, with retries.
- Integrate all your existing monitoring tools- PagerDuty works great with almost all monitoring tools including: Nagios (and Icinga), Keynote, New Relic, Pingdom, Circonus, Red Gate SQL Monitor, Server Density, Zenoss, Monit, Munin, SolarWinds and many others. If it can send email, it will work with PagerDuty.
- Native apps with push notifications- iOS and Android native apps with push notifications and a cross-platform mobile website ensure you can respond to alerts wherever you are, even on the go.
"Monitoring for many apps (databases, web servers, etc)" is the primary reason why developers consider Datadog over the competitors, whereas "Just works" was stated as the key factor in picking PagerDuty.
According to the StackShare community, Datadog has a broader approval, being mentioned in 541 company stacks & 223 developers stacks; compared to PagerDuty, which is listed in 303 company stacks and 47 developer stacks.
What is Datadog?
What is PagerDuty?
Need advice about which tool to choose?Ask the StackShare community!
Sign up to add, upvote and see more prosMake informed product decisions
Sign up to get full access to all the companiesMake informed product decisions
Sign up to get full access to all the tool integrationsMake informed product decisions
Data science and engineering teams at Lyft maintain several big data pipelines that serve as the foundation for various types of analysis throughout the business.
Apache Airflow sits at the center of this big data infrastructure, allowing users to “programmatically author, schedule, and monitor data pipelines.” Airflow is an open source tool, and “Lyft is the very first Airflow adopter in production since the project was open sourced around three years ago.”
There are several key components of the architecture. A web UI allows users to view the status of their queries, along with an audit trail of any modifications the query. A metadata database stores things like job status and task instance status. A multi-process scheduler handles job requests, and triggers the executor to execute those tasks.
Airflow supports several executors, though Lyft uses CeleryExecutor to scale task execution in production. Airflow is deployed to three Amazon Auto Scaling Groups, with each associated with a celery queue.
Audit logs supplied to the web UI are powered by the existing Airflow audit logs as well as Flask signal.
Datadog, Statsd, Grafana, and PagerDuty are all used to monitor the Airflow system.
Which #APM / #Infrastructure #Monitoring solution to use?
The 2 major players in that space are New Relic and Datadog Both are very comparable in terms of pricing, capabilities (Datadog recently introduced APM as well).
In our use case, keeping the number of tools minimal was a major selection criteria.
As we were already using #NewRelic, my recommendation was to move to the pro tier so we would benefit from advanced APM features, synthetics, mobile & infrastructure monitoring. And gain 360 degree view of our infrastructure.
Few things I liked about New Relic: - Mobile App and push notificatin - Ease of setting up new alerts - Being notified via email and push notifications without requiring another alerting 3rd party solution
I've certainly seen use cases where NewRelic can also be used as an input data source for Datadog. Therefore depending on your use case, it might also be worth evaluating a joint usage of both solutions.
Our primary source of monitoring and alerting is Datadog. We’ve got prebuilt dashboards for every scenario and integration with PagerDuty to manage routing any alerts. We’ve definitely scaled past the point where managing dashboards is easy, but we haven’t had time to invest in using features like Anomaly Detection. We’ve started using Honeycomb for some targeted debugging of complex production issues and we are liking what we’ve seen. We capture any unhandled exceptions with Rollbar and, if we realize one will keep happening, we quickly convert the metrics to point back to Datadog, to keep Rollbar as clean as possible.
We use Segment to consolidate all of our trackers, the most important of which goes to Amplitude to analyze user patterns. However, if we need a more consolidated view, we push all of our data to our own data warehouse running PostgreSQL; this is available for analytics and dashboard creation through Looker.
We're a real-time financial services messaging company, so being able to monitor our servers and applications in real-time is important to us. We also like a good deal, so $15/server seemed a bargain.
What were we looking for?
We wanted to monitor our MS infrastructure (servers, SQL) and apps (C#) to understand performance issues and be able to rectify. We also want to be able to do long-term trending. And we wanted to go from nothing to live in a short time.
Installing the Datadog agent on the servers was a breeze and enabling the integrations for SQL and Windows trivial.
Using the StatsD based API was also very easy - no worrying about JSON or UDP calls. The ability to add tags to all metrics is also a key benefit. We run multiple (100+) instances of a single application and being able to distinguish events from each one via tagging, or to see aggregates, is extremely useful.
In all it took 2 days R&D to instrument our key applications sufficiently for production deployment. Deploying the agent to our production servers took 30 mins, giving our Ops team complete visibility for the 1st time.
What have we learned
Since we've been live Datadog has given us numerous insights into the way our system behaves, from uneven server loadings and sporadic memory usage to performance tuning a key application that resulted in a 50% increase in throughput. Knowing what's taking the time has been a boon.
The other nice surprise has been the evolving nature of Datadog. It seems like every couple of weeks there's a new feature on the site.
- I like the transparent pricing. Services that won't show me the price without having to talk to a sales person are really annoying.
- Support has been good. We've contacted them several times with questions and always had a quick response (time zone considered...we're in London) and a helpful answer.
So What's bad?
Probably the weakest aspect at the moment is the long term trending of data. Whilst you can wind the time bar back to see what happened last week you can't ask questions like "show me the peak period each day for the last x months". The "get data" API is also fairly weak. Neither are concerns at the moment, and I'm sure they're on the to-do list.
I've been a systems administrator most of my career. Everywhere I went, I'd have to rebuild the same monitoring + graphing system. And then make sure that every machine wrote to that system and every application handed up the proper metrics through whatever mechanism seemed good at the time.
Then, as CTO of SimpleReach, single-handedly managing over 200 servers in addition to everything else, I found Datadog. We were already using statsd to instrument our applications, now it was just a matter of getting that data to Datadog. We use Chef, so I installed the Datadog agent on every machine in about 10 minutes and we were up and running.
The best part was that we had a deploy problem the next day with one of our main applications and troubleshooting took minutes instead of hours (and Datadog immediately paid for itself). Now no new features go out without instrumentation and no machine gets created without being monitored.
Datadog just scales with us. Great service and I highly recommend it to anyone not looking to reinvent the wheel with monitoring and instrumentation.
Datadog makes running a service with 800,000 unique users a month possible as a single developer/maintainer. I bought a separate monitor just to keep my datadog dashboards always visible and rely on triggers to keep watch over 20+ servers.
We use datadog to monitor our servers and some application metrics. Easy to get started and scale to many servers. Datadog support engineers are always quick to respond to bugs and other challenges.
Pagerduty integrates with pretty much everything, and for the ones it doesn't its easy to get it done!
We just started looking into Datadog, but from what we see, it's like New Relic meets Loggly. It's really easy to plugin different services (like the one on this list) and get detailed analysis of what is happening on your servers and services. It makes tracking down sparse and difficult to understand problems possible.
Monitoring day-to-day operations of multiple high-performance computing assets distributed across several networks. Monitoring vendor provided data and setting up alerts when things do not show up on time.
whistles If there's something weird, in your infrastructure, who you gonna call?
DevOps-Buste.. You get the idea. PagerDuty is great for quickly notifying us when things go pearshaped.
Datadog was used as an agent for monitoring and as for the statsd daemon included. This way we are able to have automated system stats and include whatever other metrics we want to track.
Datadog is used because it has a great free tier and it provides us with great insights and integrations into our infrastructure and tools.
Powerful all-in-one monitoring solution as a service. Good integration with AWS. Very affordable price for small-scale startups.
Luckily we don't end up actually using this much, but we couldn't live without it.
alerts me to any issues and blends well with other tools for uptime monitoring