What is PagerDuty?
Who uses PagerDuty?
Here are some stack decisions, common use cases and reviews by companies and developers who chose PagerDuty in their tech stack.
I chose Sqreen because it provides an out-of-the-box Security as a Service solution to protect my customer data. I get full visibility over my application security in real-time and I reduce my risk against the most common threats. My customers are happy and I don't need to spend any engineering resources or time on this. We're only alerted when our attention is required and the data that is provided helps engineering teams easily remediate vulnerabilities. The platform grows with us and will allow us to have all the right tools in place when our first security engineer joins the company. Advanced security protections against business logic threats can then be implemented.
Installation was super easy on my Node.js and Ruby apps. But Sqreen also supports Python , Java , PHP and soon Go .
It integrates well with the tools I'm using every day Slack , PagerDuty and more.
Our primary source of monitoring and alerting is Datadog. We’ve got prebuilt dashboards for every scenario and integration with PagerDuty to manage routing any alerts. We’ve definitely scaled past the point where managing dashboards is easy, but we haven’t had time to invest in using features like Anomaly Detection. We’ve started using Honeycomb for some targeted debugging of complex production issues and we are liking what we’ve seen. We capture any unhandled exceptions with Rollbar and, if we realize one will keep happening, we quickly convert the metrics to point back to Datadog, to keep Rollbar as clean as possible.
We use Segment to consolidate all of our trackers, the most important of which goes to Amplitude to analyze user patterns. However, if we need a more consolidated view, we push all of our data to our own data warehouse running PostgreSQL; this is available for analytics and dashboard creation through Looker.
A huge part of our continuous deployment practices is to have granular alerting and monitoring across the platform. To do this, we run Sentry on-premise, inside our VPCs, for our event alerting, and we run an awesome observability and monitoring system consisting of StatsD, Graphite and Grafana. We have dashboards using this system to monitor our core subsystems so that we can know the health of any given subsystem at any moment. This system ties into our PagerDuty rotation, as well as alerts from some of our Amazon CloudWatch alarms (we’re looking to migrate all of these to our internal monitoring system soon).
Data science and engineering teams at Lyft maintain several big data pipelines that serve as the foundation for various types of analysis throughout the business.
Apache Airflow sits at the center of this big data infrastructure, allowing users to “programmatically author, schedule, and monitor data pipelines.” Airflow is an open source tool, and “Lyft is the very first Airflow adopter in production since the project was open sourced around three years ago.”
There are several key components of the architecture. A web UI allows users to view the status of their queries, along with an audit trail of any modifications the query. A metadata database stores things like job status and task instance status. A multi-process scheduler handles job requests, and triggers the executor to execute those tasks.
Airflow supports several executors, though Lyft uses CeleryExecutor to scale task execution in production. Airflow is deployed to three Amazon Auto Scaling Groups, with each associated with a celery queue.
Audit logs supplied to the web UI are powered by the existing Airflow audit logs as well as Flask signal.
Datadog, Statsd, Grafana, and PagerDuty are all used to monitor the Airflow system.
I'm currently on PagerDuty, but I'm about to add enough users to go out of the starter tier, which will dramatically increase my license cost. PagerDuty is, in my experience, quite clunky, and I'm looking for alternatives. Squadcast is one I've found, and another is xMatters. Between the three, I'm currently leaning towards xMatters, but I'd like to know what people suggest.
Luckily we don't end up actually using this much, but we couldn't live without it. PagerDuty
- Alerting that works (and wakes you up)- When your systems go down, PagerDuty will wake you up. You choose how you want to be alerted - via phone, SMS or email, to multiple numbers, with retries.
- Integrate all your existing monitoring tools- PagerDuty works great with almost all monitoring tools including: Nagios (and Icinga), Keynote, New Relic, Pingdom, Circonus, Red Gate SQL Monitor, Server Density, Zenoss, Monit, Munin, SolarWinds and many others. If it can send email, it will work with PagerDuty.
- Native apps with push notifications- iOS and Android native apps with push notifications and a cross-platform mobile website ensure you can respond to alerts wherever you are, even on the go.
- On-call duty scheduling- Easily set up schedules to fairly share on-call duty responsibilities with your team.
- Automatic escalation of alerts- If you're paged but don't respond in time, the alert is auto-escalated to a team member. Ensures nothing slips through the cracks - ever.
- Reliable, distributed architecture- PagerDuty's infrastructure is fully replicated in multiple data centers, with fast failover when problems occur.
- Works internationally (Yes, really!)- Phone alerts can be delivered to over 170 countries and territories
- SMS alerts are available virtually world-wide. (Is my country included?)