How Sentry Receives 20 Billion Events Per Month While Preparing to Handle Twice That

35,031
Sentry
Sentry’s Application Monitoring platform helps developers see performance issues, fix errors faster, and optimize code health.

By James Cunningham, Operations Engineer, Sentry.


About Sentry

Sentry illustration

Unless your engineering team is staffed by angels who commute down to the office from heaven every morning, we’re pretty confident you run into plenty of problems developing and iterating on your applications in production. Sentry provides all the tools you need to find, triage, reproduce, and fix application-level issues before your users even know there was a problem. With the added bonus that you won’t get any more nasty looks from support engineers at happy hour.

By automating error detection and aggregating and adding important context to stack traces, Sentry helps you proactively correct the errors that are doing the most harm to your business more efficiently and durably and with minimal disruption. Closing the gap between the product team and customers improves productivity, speeds up the entire development process, and helps engineers focus on what they do best: build apps that make users’ lives better.

I was personally a Sentry user way before I was an employee. Early on at my previous company, I was tasked with upgrading the open-source error tracking service that hadn’t really been maintained or used for a while. I reached out for help and heard back from David (Sentry’s co-founder) and Matt (Sentry’s second engineer), meeting two of my future co-workers on IRC years before I ever saw their faces (protip: connect with Matt on LinkedIn).


This is Matt

This is Matt


They were incredibly helpful and, when I went looking for a new job, I thought, “Hey, this is a very nice piece of software, and the people who are running it are really mindful of their community. I’d love to be a part of that.” Today, I spend my waking hours happily keeping Sentry’s hosted service operational, available, and responsive to our exponentially-increasing event volume (editor’s note: when he’s not trolling new hires on Slack for their taste in hip-hop and Fruit Gushers).

A Powerful Side Project

Sentry started as (and remains) an open-source project, growing out of an error logging tool David built in 2008. He displayed a truly shrewd notion of branding even then, giving the project a catchy name that companies the world over remain jealous of to this day: django-db-log. For the longest time, Sentry’s subtitle on GitHub was “A simple Django app, built with love.” A slightly more accurate description probably would have included Starcraft and Soylent alongside love; regardless, this captured what Sentry was all about.

That original build nine years ago was Django and Celery (Python’s asynchronous task codebase), with Postgres as the database and Redis as the power behind Celery.

A Fast-Growing Company

As you might expect, Sentry usage has grown exponentially over the past decade, and the infrastructure has changed and matured to accommodate massive scale. We now host the open-source project as a SaaS product. Sentry has SDKs for just about every framework, platform, and language and integrations with the most popular developer tools, which helps make it incredibly easy to adopt. Today, Sentry is central to the error tracking and resolution workflows of tens of thousands of organizations and more than 100,000 active users around the world, many of whom support implementations for some of the biggest properties on the internet: Dropbox, Uber, Stripe, Airbnb, Xbox Live, HubSpot, and more. That’s 5 billion events per week, just from the hosted service.

When a customer sends events to Sentry, they don’t receive a laundry list of notifications, they get the aggregate issue with counts of how often it’s occurred and which of their users are experiencing the issue. This is all presented very simply and cleanly in Sentry, but if a user wants individual events, we’ll provide those also. We save every single event we accept, which gets very expensive to do in a traditional relational database.

One of the first improvements Sentry made to address scalability was storing all of these events in a distributed key-value store. There are a variety of key-value stores out there, all with their promises and pitfalls, but when evaluating solutions, we ultimately chose Riak. Our Riak cluster does exactly what we want it to: write event data to more than one location, grow or shrink in size upon request, and persist through normal failure scenarios.

The first major infrastructure project that I contributed to when joining Sentry was horizontally scaling our ability to execute offline tasks. As Sentry runs throughout the day, there are about 50 different offline tasks that we execute—anything from “process this event, pretty please” to “send all of these cool people some emails.” There are some that we execute once a day and some that execute thousands per second.

Managing this variety requires a reliably high-throughput message-passing technology. We use Celery’s RabbitMQ implementation, and we stumbled upon a great feature called Federation that allows us to partition our task queue across any number of RabbitMQ servers and gives us the confidence that, if any single server gets backlogged, others will pitch in and distribute some of the backlogged tasks to their consumers.

Another project we’ve undergone is setting up safeguards in front of our application to protect from unpredictable and unwanted traffic. When accepting events, we would be crazy to just expose the Python web process to the public Internet and say, “Alright, give me all you got!” Instead, we use two different proxying services that sit in front of our web machines:

  • NGINX, our product-aware proxy, handles many of the upper bounds that we have deemed reasonable. It is responsible for a variety of bounds, but its most popular one is protecting Sentry from exceedingly large event volumes. Ever so often, a user will run into a problem where they’ve deployed their code out into the abyss, and their event volume clocks in at a few zeroes higher than what they signed up for.
  • - In front of NGINX, we use another proxying service called HAProxy, which acts as a delta of connections without any of that product awareness logic and has a lot higher throughput. All it does is accept connections and send them off to different NGINX servers, allowing us to gracefully add or remove NGINX servers as we see fit.


Everything is fine now


An Evolving Architecture

Sentry began life as a traditional Django application, and has gone through a couple of architecture iterations since. The current Sentry dashboard, which is what customers use to browse and debug their production issues, has evolved into a single-page application written in React and Reflux (an early Flux library). We write ES6 and transpile to JavaScript using Babel and Webpack. For fetching and submitting data, we communicate with the Django backend through a straightforward REST-based HTTP API.

The event processing pipeline, which is responsible for handling all of the ingested event data that makes it through to our offline task processing, is written primarily in Python. For particularly intense code paths, like our source map processing pipeline, we have begun re-writing those bits in Rust. Rust’s lack of garbage collection makes it a particularly convenient language for embedding in Python. It allows us to easily build a Python extension where all memory is managed from the Python side (if the Python wrapper gets collected by the Python GC we clean up the Rust object as well.)


Sentry Releases animation


A Simple Deploy Workflow

For the most part, Sentry is still a classically monolithic app. This is driven, in part, by the fact that Sentry is still open-source, and we want to make it easy for our community to install and run the server themselves. To do this, we provide installation details for a Docker image that contains all of Sentry’s core services in one place. This monolithic nature makes contributing to and deploying Sentry ourselves relatively straightforward.

When someone wants to commit a change to the codebase, it is submitted as a pull request to our public project on GitHub. From there, Travis CI runs a set of parallelized builds, which include not only unit and integration tests, but also visual regression tests that are managed through Percy. Since we’re still an open-source project that supports different relational databases, we run test suites not only for Postgres, but also for MySQL and SQLite, as well.

Once all tests are green, the code has been reviewed, and any detected UI changes have been approved, the code is merged through GitHub. We then use an internal open-source tool named Freight to build and deploy our Docker image to production. Additionally, Freight injects the only closed source piece of Sentry, our billing platform. Once the image is in production, we trigger a rolling restart of every Sentry container to pick up the new image.


Sentry plus Slack integration GIF


An Unpredictable World

One of our biggest challenges is that Sentry’s traffic is inherently unpredictable, and there’s simply no way to foresee when a user’s application is going to melt down and send us a huge influx of events. On bare metal, we handled this by preparing for the worst(ish) and over-provisioning machines in case of an event deluge. Unfortunately, as demand grew, our time window for needing new machines shrunk. We started demanding more from our provider, requesting machines before they were needed, and keeping common machines idle for days on end, waiting to see which component needed it the most.

For that reason, we made the leap to Google Cloud Platform (GCP) in July 2017 to give ourselves greater flexibility. Calling it a “leap” makes it sound impulsive, but the transition actually took months of planning. And no matter how long we spent projecting resource usage within Google Compute Engine, we never would have predicted our increased throughput. Due to GCP’s default microarchitecture, Haswell, we noticed an immediate performance increase across our CPU-intensive workloads, namely source map processing. The operations team spent the next few weeks making conservative reductions in our infrastructure, and still managed to cut our costs by roughly 20%. No fancy cloud technology, no giant infrastructure undertaking -- just new rocks that were better at math.

You can find way more detail about it on the Google Cloud Platform Blog.

Observability and Action

A big reason we can sustain Sentry is that it falls into a category of observability tooling that requires a non-trivial amount of resources to host. We run Sentry ourselves because we’ve gotten pretty good at it. We rely on Sentry to track errors in our production app and help us set priorities for iteration, based on user experience and impact.

But when it comes to the rest of our monitoring stack, we apply the same thinking as the users signing up for Sentry’s hosted service every day: “It’s better to pay for uptime in dollars than in engineering hours.” (If you haven’t used Sentry’s hosted service, it only takes a couple minutes and a few lines of code to set up.)

We use a few toolchains outside of our production environment. I could write an essay detailing each (and I probably will), but let’s just outline how I would get notified that we’ve regressed in our 95th percentile of request latency:

  • Each host running a web server sends the timing of requests to Stripe’s Veneur
  • Veneur creates histograms of request timings and forwards those to Datadog
  • A Datadog threshold alert detects we’ve gone higher than 500ms
  • The threshold alert is configured to notify a Slack channel and a PagerDuty rotation
  • The PagerDuty rotation notifies both operations engineers currently on-call


Sentry welcome gif

We introduce every new employee with their own welcome gif


Fantastic Co-Workers

Our Engineering org is split into four teams in two programs: Product and Infrastructure. Their names do a pretty solid job describing their purposes, but:

  • Product is broken into the Workflow and Growth teams. Workflow focuses specifically on how our users interact with Sentry throughout their own workflows and development processes. Growth looks at the tweaks we can make that will increase the likelihood that a new user will find Sentry relevant, onboard effectively, and stick around to use it more and more.

  • Infrastructure is broken into the Platform and Operations teams. Platform is dedicated to all of the Sentry code that powers our API, including event ingestion. Operations is where I live, and we’re dedicated to building, deploying, maintaining, and monitoring all of the components that keep sentry.io stable.

We also have an unofficial fifth team that plays a large part in Sentry’s development and will always outnumber the others: our open-source contributors. Sentry’s entire codebase is right on GitHub for the whole world to see, and many improvements to our service have been introduced by users and community members who don’t work here.

Other Stacks

Just as Sentry is a part of many software teams’ stacks, we rely on a number of additional commercial and open-source services to help run our business. We use Stripe to handle customer billing, SendGrid for reliable email delivery, Slack for team communication, Google Analytics for basic web analytics, BigQuery for data warehousing, and Jira for project management.

On the open-source side, our growth and BI teams use Redash to derive useful statistics from our data. We use Jekyll to publish sentry.io and other online marketing content, like our blog.

Closing


Sentry team photo


Open source, open company. That’s our credo, and it really captures what we’re all about. As I mentioned earlier, I applied for a job at Sentry because it’s such a nice piece of software, and the people who run the company are mindful about the role of the community. Since everyone who works here is also a member of the open-source community, that mindfulness extends to and flows between employees.

Growth is inevitable here. The hard decision is not what to scale, but when. It’s the Operations team’s responsibility to put engineering hours into the right initiative and balance scale with security, reliability, and productivity. Maybe you want to make some of those hard decisions on my team?

Or maybe operations isn’t your thing, but you want to build something open-source. Want to contribute to Sentry beyond just code? We’re hiring pretty much across the organization and would love to talk to you if you’ve read this entire post and think you still might be as into Sentry as I am.

Sentry
Sentry’s Application Monitoring platform helps developers see performance issues, fix errors faster, and optimize code health.
Tools mentioned in article
Open jobs at Sentry
Software Engineer, New Grad 2022
Toronto, Canada

About Sentry

Bad software is everywhere, and we’re tired of it. Sentry is on a mission to help developers write better software faster, so we can get back to enjoying technology.

With more than $127 million in funding and 80,000 organizations that believe we’re on to something, we're building performance and error monitoring tools that help companies like Disney, Microsoft, and Atlassian spend less time fixing bugs and more time building products. If you like to selfishly build things that make your digital life better, come help us build the next generation of software monitoring tools.

About the Role

Sentry is looking for talented new graduates to join our growing Software Engineering team. Engineers at Sentry are challenged to solve a range of technical problems: from building fast and delightful UIs for navigating millions of error and performance events, to architecting solutions that ingest, store, and index terabytes of data every day. As a Software Engineer, New Grad (2022), you will be coached by a diverse team of multi-talented engineers and will directly impact a complex and growing codebase.

In this role you will:

  • Work with a team to develop and extend the Sentry product by writing software in either Python or JavaScript (or both)
  • Be responsible for full software development lifecycle - design, development, testing, and operating in production
  • Communicate effectively with other engineers in the same team, with other teams, and with various other stakeholders (such as product managers)
  • Act on feedback, coaching, and mentorship from your manager and teammates

You'll love this job if:

  • You want to actively use the product you're building (we dogfood Sentry every day)
  • You want to start your career at a high-growth startup
  • You want to join a modern software development team that iterates & ships code rapidly
  • You're excited at the opportunity to contribute to an open-source project every day

Qualifications

  • B.S. or higher in Computer Science (or similar degree program) graduating in Fall 2021 or Spring 2022
  • At least 1 previous internship or equivalent practical experience
  • Implementation skills with one or more general-purpose programming languages, e.g. Python, JavaScript, Java, etc.
  • Good knowledge of algorithms, data structures, and object-oriented design principles
  • Experience working with version control and unit testing
  • The start date is September 2022

Benefits

  • Competitive salary and meaningful equity
  • 100% medical, dental, and vision coverage for employees, 75% company-paid for dependents
  • Monthly commuter subsidy
  • Learning & Development stipend
  • Charitable matching program
  • Generous parental leave policy
  • Flexible working schedule and vacation policy, work from home policy, and real work/life balance
  • Catered lunches
  • Company events (Hack Weeks, All Hands, quarterly social events) and friends and family events
  • Relocation assistance - you are living in, or willing to relocate to, Toronto, Canada area

COVID Vaccine Required - Reasonable Accommodations for Medical or Religious Reasons Considered

Sentry values diversity and inclusivity in our company and is an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Software Engineer, New Grad 2022
San Francisco, CA

About Sentry

Bad software is everywhere, and we’re tired of it. Sentry is on a mission to help developers write better software faster, so we can get back to enjoying technology.

With more than $127 million in funding and 70,000 organizations that believe we’re on to something, we're building performance and error monitoring tools that help companies like Disney, Microsoft, and Atlassian spend less time fixing bugs and more time building products. If you like to selfishly build things that make your digital life better, come help us build the next generation of software monitoring tools.

About the Role

Sentry is looking for talented new graduates to join our growing Software Engineering team. Engineers at Sentry are challenged to solve a range of technical problems: from building fast and delightful UIs for navigating millions of error and performance events, to architecting solutions that ingest, store, and index terabytes of data every day. As a Software Engineer, New Grad (2022), you will be coached by a diverse team of multi-talented engineers and will directly impact a complex and growing codebase.

In this role you will:

  • Work with a team to develop and extend the Sentry product by writing software in either Python or JavaScript (or both)
  • Be responsible for full software development lifecycle - design, development, testing, and operating in production
  • Communicate effectively with other engineers in the same team, with other teams and with various other stakeholders (such as product managers)
  • Act on feedback, coaching, and mentorship from your manager and teammates

You'll love this job if:

  • You want to actively use the product you're building (we dogfood Sentry every day)
  • You want to start your career at a high-growth startup
  • You want to join a modern software development team that iterates & ships code rapidly
  • You're excited at the opportunity to contribute to an open-source project every day

Qualifications

  • B.S. or higher in Computer Science (or similar degree program) graduating in Winter 2021 or Spring 2022
  • At least 1 previous internship or equivalent practical experience
  • Implementation skills with one or more general-purpose programming languages, e.g. Python, JavaScript, Java, etc.
  • Good knowledge of algorithms, data structures, and object-oriented design principles
  • Experience working with version control and unit testing
  • The start date is September 2022

Benefits

  • Competitive salary and meaningful equity
  • 100% medical, dental, and vision coverage for employees, 75% company-paid for dependents
  • Monthly commuter subsidy
  • 401k program
  • Learning & Development stipend
  • Charitable matching program
  • Generous parental leave policy
  • Flexible working schedule and vacation policy, work from home policy, and real work/life balance
  • Catered lunches
  • Company events (Hack Weeks, All Hands, quarterly social events) and friends and family events
  • Relocation assistance - you are living in, or willing to relocate to the San Francisco Bay Area

COVID Vaccine Required - Reasonable Accommodations for Medical or Religious Reasons Considered

Sentry values diversity and inclusivity in our company and is an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Senior Software Engineer, Ecosystem
San Francisco, CA

About Sentry

Bad software is everywhere, and we’re tired of it. Sentry is on a mission to help developers write better software faster, so we can get back to enjoying technology.

With more than $127 million in funding and 80,000 organizations that believe we’re on to something, we're building performance and error monitoring tools that help companies like Disney, Microsoft, and Atlassian spend less time fixing bugs and more time building products. If you like to selfishly build things that make your digital life better, come help us build the next generation of software monitoring tools.

About the Role

Sentry is one of many tools developers use to create and ship high-quality production software. The Ecosystem team is tasked with connecting Sentry with this wider set of developer tools, by continually expanding and improving our API platform, as well as building first-class integrations with the industry's most popular products (GitHub, Slack, etc.). The goal: making sure Sentry works nicely with every team's preferred development workflow.

As a Senior Software Engineer on the Ecosystem team, you'll take on a lead role in growing our developer API platform and first-class integrations. This platform doesn't just allow external integrators to communicate with our REST API; it lets them augment the in-application user experience with new product capabilities. You'll work directly with major partners and 3rd-party developers to validate your progress, ensure the success of integrators, and ultimately deliver a world-class integration platform.

If you want to work in a high-leverage role where you're not just building product features – you're building a platform in which anybody can build on top of – this could be the job for you.

  • Ensure Sentry's first-party integrations (GitHub, Slack, etc.) remain best-in-class.
  • Communicate with internal and external engineering teams.
  • Review code and mentor junior colleagues.
  • Lead design and discussions for projects the team is working on.
  • Improve the experience external developers have when interacting with our API and Integration Platform features.
  • Elevate our existing APIs, from security hardening to performance tuning, to meet the needs of our growing customer base.

Qualifications

  • 5+ years building web applications; 2+ years building high traffic web applications at scale.
  • Experience with Python, JavaScript, or other dynamic programming languages.
  • Ability to write robust, well designed, full-stack code while understanding the long-term tradeoffs of your choices.
  • Passion and experience for API design and best practices.

Benefits

  • Competitive salary and meaningful equity
  • 100% medical, dental, and vision coverage for employees, 75% company-paid for dependents
  • Monthly commuter subsidy
  • 401k program
  • Learning & Development stipend
  • Charitable matching program
  • Generous parental leave policy
  • Flexible working schedule and vacation policy, work from home policy, and real work/life balance
  • Catered lunches
  • Company events (Hack Weeks, All Hands, quarterly social events) and friends and family events
  • Relocation assistance - you are living in, or willing to relocate to the San Francisco Bay Area

COVID Vaccine Required - Reasonable Accommodations for Medical or Religious Reasons Considered

Sentry values diversity and inclusivity in our company and is an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Senior Software Engineer, Emerging Te...

About Sentry

Bad software is everywhere, and we’re tired of it. Sentry is on a mission to help developers write better software faster, so we can get back to enjoying technology.

With more than $127 million in funding and 80,000 organizations that believe we’re on to something, we're building performance and error monitoring tools that help companies like Disney, Microsoft, and Atlassian spend less time fixing bugs and more time building products. If you like to selfishly build things that make your digital life better, come help us build the next generation of software monitoring tools.

About Emerging Technology

Sentry is no longer just error monitoring. Since launching Performance monitoring in 2020, Sentry has evolved into a suite of developer-centric monitoring tools. And we're not stopping there. Emerging Technology is a new department at Sentry tasked with exploring new solutions to expand our monitoring platform, from Code Profiling to DOM Session Replays and more. To get there we're empowering small, tight-knit teams to operate semi-autonomously and with a ton of ownership: you know, like a startup.

As a Senior Software Engineer on the Emerging Technology team, you will ideate on new product ideas, build proof-of-concept implementations, and collaborate on early versions to get in the hands of customers to figure out if we're onto something. If you are a startup-driven product developer who enjoys tinkering with new technology, who thrives on ambiguity, and is willing to build software even if it means there's a strong chance we'll throw it away, this role could be for you.

In this role you will:

  • Rapidly iterate on proof-of-concept implementations of new product ideas.
  • Engage regularly with internal users and beta customers to get feedback and inform what you do next.
  • Use Sentry, analytics tools, and A/B testing to understand if and how your products are succeeding (or not succeeding) in production.
  • Collaborate with Infrastructure and Operations teams to design solutions that scale (we process over a billion events every day).

You'll love this job if:

  • You're an experienced programmer who's comfortable designing and implementing features from beginning to end.
  • While you might focus more on one stack, you're not fazed by jumping into either front-end or back-end code.
  • You want to selfishly build products to solve your own challenges within the software field.
  • You're not undeterred by the prospect of something not working out. If the solution was obvious, this team wouldn't be building it.
  • You possess a high internal quality bar for building good products.

Qualifications

  • 6+ years experience in Software Engineering or a similar role
  • Experience in building large scale web applications (we primarily use React, Typescript, Django, Postgres)

Benefits 

  • Competitive salary and meaningful equity
  • 100% medical, dental, and vision coverage for employees, 75% company-paid for dependents
  • Monthly commuter subsidy
  • Learning & Development stipend
  • Charitable matching program
  • Generous parental leave policy
  • Flexible working schedule and vacation policy, work from home policy, and real work/life balance
  • Catered lunches
  • Company events (Hack Weeks, All Hands, quarterly social events) and friends and family events
  • Relocation assistance - you are living in, or willing to relocate to, Toronto, Canada area

COVID Vaccine Required - Reasonable Accommodations for Medical or Religious Reasons Considered

Sentry values diversity and inclusivity in our company and is an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

You may also like