How Sentry Receives 20 Billion Events Per Month While Preparing to Handle Twice That

30,201
Sentry
Developers use Sentry to cut time to resolution for application issues from five hours to five minutes.

By James Cunningham, Operations Engineer, Sentry.


About Sentry

Sentry illustration

Unless your engineering team is staffed by angels who commute down to the office from heaven every morning, we’re pretty confident you run into plenty of problems developing and iterating on your applications in production. Sentry provides all the tools you need to find, triage, reproduce, and fix application-level issues before your users even know there was a problem. With the added bonus that you won’t get any more nasty looks from support engineers at happy hour.

By automating error detection and aggregating and adding important context to stack traces, Sentry helps you proactively correct the errors that are doing the most harm to your business more efficiently and durably and with minimal disruption. Closing the gap between the product team and customers improves productivity, speeds up the entire development process, and helps engineers focus on what they do best: build apps that make users’ lives better.

I was personally a Sentry user way before I was an employee. Early on at my previous company, I was tasked with upgrading the open-source error tracking service that hadn’t really been maintained or used for a while. I reached out for help and heard back from David (Sentry’s co-founder) and Matt (Sentry’s second engineer), meeting two of my future co-workers on IRC years before I ever saw their faces (protip: connect with Matt on LinkedIn).


This is Matt

This is Matt


They were incredibly helpful and, when I went looking for a new job, I thought, “Hey, this is a very nice piece of software, and the people who are running it are really mindful of their community. I’d love to be a part of that.” Today, I spend my waking hours happily keeping Sentry’s hosted service operational, available, and responsive to our exponentially-increasing event volume (editor’s note: when he’s not trolling new hires on Slack for their taste in hip-hop and Fruit Gushers).

A Powerful Side Project

Sentry started as (and remains) an open-source project, growing out of an error logging tool David built in 2008. He displayed a truly shrewd notion of branding even then, giving the project a catchy name that companies the world over remain jealous of to this day: django-db-log. For the longest time, Sentry’s subtitle on GitHub was “A simple Django app, built with love.” A slightly more accurate description probably would have included Starcraft and Soylent alongside love; regardless, this captured what Sentry was all about.

That original build nine years ago was Django and Celery (Python’s asynchronous task codebase), with Postgres as the database and Redis as the power behind Celery.

A Fast-Growing Company

As you might expect, Sentry usage has grown exponentially over the past decade, and the infrastructure has changed and matured to accommodate massive scale. We now host the open-source project as a SaaS product. Sentry has SDKs for just about every framework, platform, and language and integrations with the most popular developer tools, which helps make it incredibly easy to adopt. Today, Sentry is central to the error tracking and resolution workflows of tens of thousands of organizations and more than 100,000 active users around the world, many of whom support implementations for some of the biggest properties on the internet: Dropbox, Uber, Stripe, Airbnb, Xbox Live, HubSpot, and more. That’s 5 billion events per week, just from the hosted service.

When a customer sends events to Sentry, they don’t receive a laundry list of notifications, they get the aggregate issue with counts of how often it’s occurred and which of their users are experiencing the issue. This is all presented very simply and cleanly in Sentry, but if a user wants individual events, we’ll provide those also. We save every single event we accept, which gets very expensive to do in a traditional relational database.

One of the first improvements Sentry made to address scalability was storing all of these events in a distributed key-value store. There are a variety of key-value stores out there, all with their promises and pitfalls, but when evaluating solutions, we ultimately chose Riak. Our Riak cluster does exactly what we want it to: write event data to more than one location, grow or shrink in size upon request, and persist through normal failure scenarios.

The first major infrastructure project that I contributed to when joining Sentry was horizontally scaling our ability to execute offline tasks. As Sentry runs throughout the day, there are about 50 different offline tasks that we execute—anything from “process this event, pretty please” to “send all of these cool people some emails.” There are some that we execute once a day and some that execute thousands per second.

Managing this variety requires a reliably high-throughput message-passing technology. We use Celery’s RabbitMQ implementation, and we stumbled upon a great feature called Federation that allows us to partition our task queue across any number of RabbitMQ servers and gives us the confidence that, if any single server gets backlogged, others will pitch in and distribute some of the backlogged tasks to their consumers.

Another project we’ve undergone is setting up safeguards in front of our application to protect from unpredictable and unwanted traffic. When accepting events, we would be crazy to just expose the Python web process to the public Internet and say, “Alright, give me all you got!” Instead, we use two different proxying services that sit in front of our web machines:

  • NGINX, our product-aware proxy, handles many of the upper bounds that we have deemed reasonable. It is responsible for a variety of bounds, but its most popular one is protecting Sentry from exceedingly large event volumes. Ever so often, a user will run into a problem where they’ve deployed their code out into the abyss, and their event volume clocks in at a few zeroes higher than what they signed up for.
  • - In front of NGINX, we use another proxying service called HAProxy, which acts as a delta of connections without any of that product awareness logic and has a lot higher throughput. All it does is accept connections and send them off to different NGINX servers, allowing us to gracefully add or remove NGINX servers as we see fit.


Everything is fine now


An Evolving Architecture

Sentry began life as a traditional Django application, and has gone through a couple of architecture iterations since. The current Sentry dashboard, which is what customers use to browse and debug their production issues, has evolved into a single-page application written in React and Reflux (an early Flux library). We write ES6 and transpile to JavaScript using Babel and Webpack. For fetching and submitting data, we communicate with the Django backend through a straightforward REST-based HTTP API.

The event processing pipeline, which is responsible for handling all of the ingested event data that makes it through to our offline task processing, is written primarily in Python. For particularly intense code paths, like our source map processing pipeline, we have begun re-writing those bits in Rust. Rust’s lack of garbage collection makes it a particularly convenient language for embedding in Python. It allows us to easily build a Python extension where all memory is managed from the Python side (if the Python wrapper gets collected by the Python GC we clean up the Rust object as well.)


Sentry Releases animation


A Simple Deploy Workflow

For the most part, Sentry is still a classically monolithic app. This is driven, in part, by the fact that Sentry is still open-source, and we want to make it easy for our community to install and run the server themselves. To do this, we provide installation details for a Docker image that contains all of Sentry’s core services in one place. This monolithic nature makes contributing to and deploying Sentry ourselves relatively straightforward.

When someone wants to commit a change to the codebase, it is submitted as a pull request to our public project on GitHub. From there, Travis CI runs a set of parallelized builds, which include not only unit and integration tests, but also visual regression tests that are managed through Percy. Since we’re still an open-source project that supports different relational databases, we run test suites not only for Postgres, but also for MySQL and SQLite, as well.

Once all tests are green, the code has been reviewed, and any detected UI changes have been approved, the code is merged through GitHub. We then use an internal open-source tool named Freight to build and deploy our Docker image to production. Additionally, Freight injects the only closed source piece of Sentry, our billing platform. Once the image is in production, we trigger a rolling restart of every Sentry container to pick up the new image.


Sentry plus Slack integration GIF


An Unpredictable World

One of our biggest challenges is that Sentry’s traffic is inherently unpredictable, and there’s simply no way to foresee when a user’s application is going to melt down and send us a huge influx of events. On bare metal, we handled this by preparing for the worst(ish) and over-provisioning machines in case of an event deluge. Unfortunately, as demand grew, our time window for needing new machines shrunk. We started demanding more from our provider, requesting machines before they were needed, and keeping common machines idle for days on end, waiting to see which component needed it the most.

For that reason, we made the leap to Google Cloud Platform (GCP) in July 2017 to give ourselves greater flexibility. Calling it a “leap” makes it sound impulsive, but the transition actually took months of planning. And no matter how long we spent projecting resource usage within Google Compute Engine, we never would have predicted our increased throughput. Due to GCP’s default microarchitecture, Haswell, we noticed an immediate performance increase across our CPU-intensive workloads, namely source map processing. The operations team spent the next few weeks making conservative reductions in our infrastructure, and still managed to cut our costs by roughly 20%. No fancy cloud technology, no giant infrastructure undertaking -- just new rocks that were better at math.

You can find way more detail about it on the Google Cloud Platform Blog.

Observability and Action

A big reason we can sustain Sentry is that it falls into a category of observability tooling that requires a non-trivial amount of resources to host. We run Sentry ourselves because we’ve gotten pretty good at it. We rely on Sentry to track errors in our production app and help us set priorities for iteration, based on user experience and impact.

But when it comes to the rest of our monitoring stack, we apply the same thinking as the users signing up for Sentry’s hosted service every day: “It’s better to pay for uptime in dollars than in engineering hours.” (If you haven’t used Sentry’s hosted service, it only takes a couple minutes and a few lines of code to set up.)

We use a few toolchains outside of our production environment. I could write an essay detailing each (and I probably will), but let’s just outline how I would get notified that we’ve regressed in our 95th percentile of request latency:

  • Each host running a web server sends the timing of requests to Stripe’s Veneur
  • Veneur creates histograms of request timings and forwards those to Datadog
  • A Datadog threshold alert detects we’ve gone higher than 500ms
  • The threshold alert is configured to notify a Slack channel and a PagerDuty rotation
  • The PagerDuty rotation notifies both operations engineers currently on-call


Sentry welcome gif

We introduce every new employee with their own welcome gif


Fantastic Co-Workers

Our Engineering org is split into four teams in two programs: Product and Infrastructure. Their names do a pretty solid job describing their purposes, but:

  • Product is broken into the Workflow and Growth teams. Workflow focuses specifically on how our users interact with Sentry throughout their own workflows and development processes. Growth looks at the tweaks we can make that will increase the likelihood that a new user will find Sentry relevant, onboard effectively, and stick around to use it more and more.

  • Infrastructure is broken into the Platform and Operations teams. Platform is dedicated to all of the Sentry code that powers our API, including event ingestion. Operations is where I live, and we’re dedicated to building, deploying, maintaining, and monitoring all of the components that keep sentry.io stable.

We also have an unofficial fifth team that plays a large part in Sentry’s development and will always outnumber the others: our open-source contributors. Sentry’s entire codebase is right on GitHub for the whole world to see, and many improvements to our service have been introduced by users and community members who don’t work here.

Other Stacks

Just as Sentry is a part of many software teams’ stacks, we rely on a number of additional commercial and open-source services to help run our business. We use Stripe to handle customer billing, SendGrid for reliable email delivery, Slack for team communication, Google Analytics for basic web analytics, BigQuery for data warehousing, and Jira for project management.

On the open-source side, our growth and BI teams use Redash to derive useful statistics from our data. We use Jekyll to publish sentry.io and other online marketing content, like our blog.

Closing


Sentry team photo


Open source, open company. That’s our credo, and it really captures what we’re all about. As I mentioned earlier, I applied for a job at Sentry because it’s such a nice piece of software, and the people who run the company are mindful about the role of the community. Since everyone who works here is also a member of the open-source community, that mindfulness extends to and flows between employees.

Growth is inevitable here. The hard decision is not what to scale, but when. It’s the Operations team’s responsibility to put engineering hours into the right initiative and balance scale with security, reliability, and productivity. Maybe you want to make some of those hard decisions on my team?

Or maybe operations isn’t your thing, but you want to build something open-source. Want to contribute to Sentry beyond just code? We’re hiring pretty much across the organization and would love to talk to you if you’ve read this entire post and think you still might be as into Sentry as I am.

Sentry
Developers use Sentry to cut time to resolution for application issues from five hours to five minutes.
Tools mentioned in article
Open jobs at Sentry
Sr. Software Engineer, Ecosystem
San Francisco, CA

The Sentry Story

Our Co-Founders David Cramer and Chris Jennings realized that developers are at the core of virtually every organization, building and managing the software that makes modern businesses tick. With the developers' workflow in mind, they started an open-source side project that has now turned into a well-loved product used by 1M+ developers.

It is our mission to empower software development teams to build better products, faster. Our application monitoring platform helps millions of developers discover, triage, and resolve software issues, so they can spend less time debugging and more time doing what they love... building software.

About the Team

Sentry is one of many tools developers use to create and ship high-quality production software. The Ecosystem team is tasked with connecting Sentry with this wider set of developer tools, by continually expanding and improving our API platform, as well as building first-class integrations with the industry's most popular products (GitHub, Slack, etc.). The goal: making sure Sentry works nicely with every team's preferred development workflow.

About the Role / Impact

As a Sr. Software Engineer on the Ecosystem team, you'll take on a lead role in growing our developer API platform and first-class integrations. This platform doesn't just allow external integrators to communicate with our REST API; it lets them augment the in-application user experience with new product capabilities. You'll work directly with major partners and 3rd-party developers to validate your progress, ensure the success of integrators, and ultimately deliver a world-class integration platform.

If you want to work in a high-leverage role where you're not just building product features – you're building a platform in which anybody can build on top of – this could be the job for you.

In this role, you will:

  • Ensure Sentry's first-class integrations (GitHub, Slack, etc.) remain best-in-class
  • Communicate with internal and external engineering teams
  • Make architectural decisions based on wants and needs of external engineering teams
  • Review code and mentor less-experienced teammates
  • Lead design and discussions around projects the team is working on
  • Improve the experience external developers have when interacting with our API and Integration Platform features
  • Improve the long-term quality of Sentry's Integration Platform and codebase

You will be successful in this role if you:

  • 5+ years building web applications; 2+years building high traffic web applications at scale
  • Experience with Python, Git, and PostgreSQL (or other relational databases)
  • You write robust, well designed, full-stack code while understanding the long-term tradeoffs of your choices
  • Experience and interest in API design and best practices

To Learn More About Sentry

Benefits

  • Competitive salary and meaningful equity
  • 100% medical, dental, and vision coverage for employees, 75% company-paid for dependents
  • Monthly commuter subsidy
  • 401k program
  • Learning & Development stipend
  • Charitable matching program
  • Generous parental leave policy
  • Flexible working schedule and vacation policy, work from home policy, and real work/life balance
  • Friday catered lunches
  • Company events (Hack Weeks, All Hands, quarterly social events) and friends and family events
  • Relocation assistance

Sentry values diversity and inclusivity in our company and is an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Senior Software Engineer, Search & St...
San Francisco, CA

The Sentry Story

Our Co-Founders David Cramer and Chris Jennings realized that developers are at the core of virtually every organization, building and managing the software that makes modern businesses tick. With the developers' workflow in mind, they started an open-source side project that has now turned into a well-loved product used by 1M+ developers.

It is our mission to empower software development teams to build better products, faster. Our application monitoring platform helps millions of developers discover, triage, and resolve software issues, so they can spend less time debugging and more time doing what they love... building software.

About the Team

The Search and Storage team is responsible for the infrastructure that powers all of Sentry's time-series data and searching capabilities across billions of events with sub-second latency. We started this initiative by building Snuba, the primary storage and query service for Sentry's event data powered by ClickHouse, and we're now looking to provide even more visibility and reporting on the terabytes of data that our users send us.

As a Senior Software Engineer, you'll lead efforts to bring Sentry into a new age of data visibility. You’ll do this by expanding the capabilities of our search infrastructure, developing new solutions based on our state of the art storage and increasing the performance and integrity of Sentry’s core data services. You’ll also contribute to the vision of Infrastructure at Sentry and collaborate with Product and other Engineering teams to turn that vision into a reality.

If you're looking for a high-impact role where you move a company from processing "big data" to "really big data", this could be the job for you.

In this role, you will:

  • Expand Search and Storage's impact on delivering world-class data delivery.
  • Architect and automate services and systems to meet the demand of scale.
  • Make architectural decisions to balance the wants and needs of Product and Engineering teams.
  • Maintain and grow the team's code quality initiatives by regularly reviewing code and contributing to design decisions.
  • Lead design and discussions around deliverables the team is working towards.
  • Improve the approachability of the codebases that Search and Storage holds domain over.

You will be successful in this role if you have:

  • 4+ years relevant experience
  • Strong knowledge of replicated and/or distributed data storage systems
  • Experience with Python or a similar dynamic programming language
  • You have experience with some or all of the following systems we leverage:
    • Disk-driven Storage Systems: PostgreSQL, ClickHouse
    • Memory-driven Storage Systems: Memcached, Redis
    • Streaming Platforms: Kafka, RabbitMQ
  • Excellent written and oral communication skills and ability to articulate technical concepts clearly and succinctly
  • In the San Francisco Bay Area or willing to relocate

Examples of projects our team has worked on:

 

To Learn More About Sentry

Benefits

  • Competitive salary and meaningful equity
  • 100% medical, dental, and vision coverage for employees, 75% company-paid for dependents
  • Monthly commuter subsidy
  • 401k program
  • Learning & Development stipend
  • Charitable matching program
  • Generous parental leave policy
  • Flexible working schedule and vacation policy, work from home policy, and real work/life balance
  • Friday catered lunches
  • Company events (Hack Weeks, All Hands, quarterly social events) and friends and family events
  • Relocation assistance

Sentry values diversity and inclusivity in our company and is an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Senior Full Stack Engineer - Growth
San Francisco, CA

The Sentry Story

Our Co-Founders David Cramer and Chris Jennings realized that developers are at the core of virtually every organization, building and managing the software that makes modern businesses tick. With the developers' workflow in mind, they started an open-source side project that has now turned into a well-loved product used by 1M+ developers.

It is our mission to empower software development teams to build better products, faster. Our application monitoring platform helps millions of developers discover, triage, and resolve software issues, so they can spend less time debugging and more time doing what they love... building software.

About the Team

The Growth Team at Sentry is dedicated to building world class user experience for our users, the developers!  We are a multi-disciplinary team that works across the entire product engineering stack, often bringing together resources from design, and data science to drive exponential growth for Sentry.  We are building end-to-end product experiences and constantly innovating across the board to bring Sentry to developers throughout the world.  This may just be one of the reasons why developers across the world love Sentry.  If you want to build and improve product features that empower software development teams to do their best work, all while writing open source code, we'd like to talk to you.

About the Role / Impact

You'll love this job if you:

  • Get excited about making a measurable impact on Sentry’s growth
  • Love working on innovative ideas that lead to significant changes in user acquisition, engagement and revenue
  • Are able to ship quickly and independently without creating long-term technical debt

In this role you will:

  • Work with product and design to define, implement and ship new features and experiments
  • Brainstorm opportunities for user growth with the team, whether they are new innovations or optimizations of existing flows
  • Run A/B experiments for product features and optimizations with data analysis on results
  • Participate in engineering discussions and initiatives to help take the team to the next level
  • Thrive in a collaborative environment involving different stakeholders and subject matter experts

You will be successful in this role if you have:

  • 5+ years of engineering experience
  • Experience in dynamic programming language like Javascript or Python
  • Experience in building large scale web applications (React, Typescript, Django, Postgres)
  • Excellent written and verbal communication skills and ability to articulate technical concepts clearly and succinctly
  • Bonus: Experience with contributing to or maintaining open source projects

To Learn More About Sentry

Benefits

  • Competitive salary and meaningful equity
  • 100% medical, dental, and vision coverage for employees, 75% company-paid for dependents
  • Monthly commuter subsidy
  • 401k program
  • Learning & Development stipend
  • Charitable matching program
  • Generous parental leave policy
  • Flexible working schedule and vacation policy, work from home policy, and real work/life balance
  • Friday catered lunches
  • Company events (Hack Weeks, All Hands, quarterly social events) and friends and family events
  • Relocation assistance

Sentry values diversity and inclusivity in our company and is an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Engineering Manager, Visibility
San Francisco, CA

About Sentry

Bad software is everywhere, and we’re tired of it. Sentry is on a mission to help developers write better software faster, so we can get back to enjoying technology.

With more than $67 million in funding and 20,000 customers that believe we’re on to something, we're building performance and error monitoring tools that help companies like Disney, Microsoft, and Atlassian spend less time fixing bugs and more time building products. If you like to selfishly build things that make your digital life better, come help us build the next generation of software monitoring tools.

About the Role

The Visibility team’s mission is to make Sentry the place to understand the health of your application, find and prioritize investments and resources, and uncover insights to improve your overall performance.

As the Engineering Manager of the Visibility team, you’ll lead an ensemble of engineers tasked with building the next generation product line at Sentry. You’ll work closely with your team, and cross collaborate with other Engineering functions to build features that provide value to our users and align with business goals. You’ll be an inspiration and cultivate a strong team identity by, conducting weekly syncs, career coaching, planning aggressive-but-realistic milestones, and keeping everyone accountable.

In this role you will:

  • Grow and develop a team of talented and motivated engineers with high expectations around individual ownership and impact.
  • Set direction for the team, anticipating strategic and scaling-related challenges.
  • Assist with sprint planning and task estimation.
  • Contribute to Sentry’s technical and product strategy.
  • Foster a healthy and collaborative culture that embodies our values.

You'll love this job if you:

  • Enjoy mentoring and helping other engineers grow.
  • Take pride in developing features that don't "just work," but are delightful to use.
  • Desire to join a modern software team that iterates and ships code rapidly.
  • Have a passion for Open Source.

Qualifications

  • 7+ years industry experience in software engineering, with considerable experience building full stack web applications.
  • 2+ years of people management experience.
  • A strong understanding of lean product development; you validate ideas quickly and adjust as you learn.
  • Excellent written and oral communication skills and an ability to articulate technical concepts clearly and succinctly.
  • Experience with one or more parts of our stack (Python, Django, JavaScript, React, and/or PostgreSQL).

Benefits

  • Competitive salary and meaningful equity
  • 100% medical, dental, and vision coverage for employees, 75% company-paid for dependents
  • Monthly commuter subsidy
  • 401k program
  • Learning & Development stipend
  • Charitable matching program
  • Generous parental leave policy
  • Flexible working schedule and vacation policy, work from home policy, and real work/life balance
  • Friday catered lunches
  • Company events (Hack Weeks, All Hands, quarterly social events) and friends and family events
  • Relocation assistance

Sentry values diversity and inclusivity in our company and is an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

You may also like