StackShareStackShare
Follow on
StackShare

Discover and share technology stacks from companies around the world.

Product

  • Stacks
  • Tools
  • Companies
  • Feed

Company

  • About
  • Blog
  • Contact

Legal

  • Privacy Policy
  • Terms of Service

© 2025 StackShare. All rights reserved.

API StatusChangelog
  1. Home
  2. Companies
  3. PagerDuty
PagerDuty logo

PagerDuty

Verified

PagerDuty is an alarm aggregation and dispatching service for system administrators and support teams. It collects alerts from your monitoring tools, gives you an overall view of all of your monitoring alarms, and alerts an on duty engineer if there's a problem.

www.pagerduty.com?utm_source=lsio
15
Tools
5
Decisions
349
Followers

Tech Stack

Utilities

3 tools

Braintree logo
Braintree
Slack logo
Slack
Twilio logo
Twilio

Business Tools

1 tool

Salesforce Sales Cloud logo
Salesforce Sales Cloud

Team Members

chrisgagne
chrisgagne
angchappy
angchappy
dbirck34
dbirck34
baskarp
baskarp
timarmandpour
timarmandpour
dshack
dshack
afolson
afolsonDeveloper Evangelist

Engineering Blog

Stack Decisions

StackShare Editors
StackShare Editors

Sep 3, 2016

Distributed Task Scheduling with Akka, Kafka, Cassandra

To solve the problem of scheduling and executing arbitrary tasks in its distributed infrastructure, PagerDuty created an open-source tool called Scheduler. Scheduler is written in Scala and uses Cassandra for task persistence. It also adds Apache Kafka to handle task queuing and partitioning, with Akka to structure the library’s concurrency.

The service’s logic schedules a task by passing it to the Scheduler’s Scala API, which serializes the task metadata and enqueues it into Kafka. Scheduler then consumes the tasks, and posts them to Cassandra to prevent data loss.

427k views427k
Comments
StackShare Editors
StackShare Editors

Oct 15, 2014

Throwing more hardware at Cassandra and no more multi-tenancy

On June 3, 2014 PagerDuty experienced a major issue: their Cassandra pipeline had stopped processing events and refused new ones. All in all, an outage was created that lasted 3 hours, along with additional degraded performance.

"Cassandra seems to have two modes: fine and catastrophe" said one of the PagerDuty engineers, as a seemingly routine repair had cascaded into a very bad situation. Constant memory pressure and underprovisioned amounts of RAM were isolated as a few of the factors that pointed to weaknesses in the way the cluster was set up.

After the outage, each node in the Cassandra cluster was replaced with m2.2xlarge EC2 nodes with 4 cores and 32GB of RAM. PagerDuty also moved away from using a multi-tenant Cassandra setup at that point, to help isolate failures in the future.

28.3k views28.3k
Comments
StackShare Editors
StackShare Editors

May 9, 2014

Using build artifacts to improve mobile app packaging

In 2014, PagerDuty struggled with safely releasing reliable mobile applications to users due to some issues with how the code was being packaged and handled.

PagerDuty’s mobile apps are hybrid and used Cordova to share code between platforms. Coding was straightforward but packaging was not, as a separated Gulp-based build process was also being used. The PagerDuty team took a page from Java and started creating software artifacts.

Rather than checking in transformed code or publishing modules to NPM, the team started creating zipped-up build artifacts, which coincided perfectly with GitHub's Releases feature which arrived in 2013. So despite JavaScript lacking a standard packaged app format like a JAR, PagerDuty was still able to improve the build times and sizes of their mobile apps.

111k views111k
Comments
StackShare Editors
StackShare Editors

Nov 7, 2013

Chef at PagerDuty

In late 2013, the Operations Engineering team at PagerDuty was made up of 4 engineers, and was comprised of generalists, each of whom had one or two areas of depth. Although the Operations Team ran its own on-call, each engineering team at PagerDuty also participated on the pager.

The Operations Engineering Team owned 150+ servers spanning multiple cloud providers, and used Chef to automate their infrastructure across the various cloud providers with a mix of completely custom cookbooks and customized community cookbooks.

Custom cookbooks were managed by Berkshelf, andach custom cookbook contained its own tests based on ChefSpec 3, coupled with Rspec.

Jenkins was used to GitHub for new changes and to handle unit testing of those features.

308k views308k
Comments
StackShare Editors
StackShare Editors

Oct 30, 2012

Switching Rails and MySQL to unicode without downtime

The situation in 2012 at PagerDuty was challenging: MySQL had been set up with default settings including Latin character encoding, not unicode. This started causing issues with non-romanized name users, various transliteration functions, and BLOB storage errors. PagerDuty decided to move to a universal character set once and for all.

Less than 1 minute of downtime and only a 2x increase in storage size for affected fields were goals for the migration.

It took some trial and error to get MySQL to store data the right way, and a daring few moments to flip from the old architecture to the new, but in the end the unicode migration went off without a hitch.

3.04k views3.04k
Comments