How CircleCI Processes 4.5 Million Builds Per Month

CircleCI’s continuous integration and delivery platform helps software teams rapidly release code with confidence by automating the build, test, and deploy process. CircleCI offers a modern software development platform that lets teams ramp quickly, scale easily, and build confidently every day.

By Rob Zuber, CTO at CircleCI.

CircleCI Workflow


CircleCI is a platform for continuous integration and delivery. Thousands of engineers trust us to run tests and deploy their code, so they can focus on building great software. That trust rests on a solid stack of software that we use to keep people shipping and delivering value to their users.

As CTO of Engineering at CircleCI, I help make the big technical decisions and keep our teams happy and out of trouble. Before this, I was CTO of Copious, where I learned a lot of important lessons about tech in service of building a consumer marketplace. I like snowboarding, Funkadelic, and viscous cappuccino.

The Teams

Engineers are people. People work better in small groups. So we’ve divided our team into several functional units, inspired by Spotify’s pods. We’re much smaller, so we’ve adapted their ideas to meet our needs, while maintaining the core principle that each team has the resources they need to implement a feature across the stack.

But we think of these teams as more of a guideline than actual rules, so folks are free to move around if it means they’ll be more engaged in the work. Flexibility is a key value at CircleCI: it has to be, with the majority of our engineers working remotely across multiple time zones. To keep everyone on the same page, we use Zoom for videoconferencing and screensharing and update statuses in Pingboard to keep track of who’s “in the office”.

We use JIRA to create consistency in our processes across teams. This consistency lets us stay more nimble if engineers ever need or want to switch teams. We use GitHub for version control and Slack for Giphy control. In addition to chat, we use Slack-based integrations with tools like Hubot, PagerDuty, and Looker to give us central access to many day-to-day tasks.

But you didn’t come here to read about how many Slack channels we have (241), you’re here to read about...

The Stack


Most of CircleCI is written in Clojure. It’s been this way since almost the beginning. While there were some early spikes in Rails, the passion of a sole developer won out; by the time CircleCI was released to the market, it was written entirely in Clojure and has been at our platform’s core ever since.

Our frontend used to be in CoffeeScript, but when Om made a single-page ClojureScript application viable, we opted for consistency and unification. This choice wasn’t that hard to make, given how much we enjoy using Clojure. Having a lingua franca also helps reduce overhead when engineers want to move between layers of the stack.

That doesn’t mean we won’t sharpen other tools when warranted. The build agent for our recently launched 2.0 platform is written in Go, which lets us quickly inject a multi-platform static binary into environments where we can’t lean on a bunch of dependencies. We also use Go for CLI tools where static dependency compilation and fast start-up are more important than our love of Clojure.

But as we pull microservices out of our monolith, Clojure remains our weapon of choice. We’ve already got over ten microservices, and that number is growing rapidly. A major part of this velocity stems from using Clojure, which ensures developers can rapidly move between teams and projects without climbing a huge learning curve.

The Frontend

Our web app’s UI is written in ClojureScript. Specifically, we’re using the framework Om, a ClojureScript interface to Facebook’s React. This is currently in some flux, since we’re upgrading to Om Next, an Om reboot which fixes a lot of its quirks. You can read more about why we’re so excited in this deep dive by one of our engineers, Peter Jaros.

CircleCI Screenshot

The Backend

Two Pools, Both Alike in Dignity

There are two major pools of machines: the first hosts our own services — the systems that serve our site, manage jobs, send notifications, etc. These services are deployed within Docker containers orchestrated in Kubernetes. In 2012, this configuration wasn’t really an option. As functional programmers, though, we were big believers in immutable infrastructure, so we went all in on baking AMIs and rolling them on code changes.

However, rounding boot times and charges to the hour made using full VMs slow and expensive; rolling deploys in Docker with Kubernetes is much more efficient. Kubernetes’ ecosystem and toolchain made it an obvious choice for our fairly statically-defined processes: the rate of change of job types or how many we need in our internal stack is relatively low.

On the other hand, our customers’ jobs are changing constantly. It’s challenging to dynamically predict demand, what types of jobs we’ll be running, and the resource requirements of each of those jobs. We found that Nomad excelled in this area. With a fast, flexible scheduler built-in, Nomad distributes customer jobs across our second pool of machines, reserved specifically for scheduling purposes.

While we did evaluate both Kubernetes and Nomad to do All These Things, neither tool was optimized for such an all-inclusive job. And we treat Nomad’s scheduling role as more a piece of our software stack than as a part of the management or ops layer. So we use Kubernetes to manage the Nomad servers.

We’ve also recently started using Helm to make it easier to deploy new services into Kubernetes. We’ve had to build a couple small services to string the full CD process together with Helm, while also keeping Kubernetes locked down — but the results have been great. We create a chart (i.e. package) for each service. This lets us easily roll back new software and gives us an audit trail of what was installed or upgraded.


For the last five years, we’ve run our infrastructure on AWS. It started simply because our architecture was simple but evolved into a necessarily complex stack of Linked Accounts, VPCs, Security Groups, and everything else AWS offers to help partition and restrict resources. We’re also running across multiple regions. Our deep investment in AWS led to increasing assumptions in our code about how the software was being managed.

When we introduced CircleCI Enterprise (our on-prem offering), we started supporting a number of different deployment models. We also started separating ourselves further from the system by packaging our code in Docker containers and using cloud-agnostic Kubernetes to manage resources and distribution.

With a much lower level of vendor lock-in, we’ve gained the flexibility to push part of our workload to Google Cloud Platform (GCP) when it suits us. We chose GCP because it’s particularly well-suited for short-lived VMs. Today, if you use our machine executor to run a job, it will run in GCP. This executor type allocates a full VM for tasks that need it.

We’ve also wrapped GCP in a VM service that preallocates machines, then tears everything down once you’re finished. Using an entire VM means you have full control over a much faster machine. We’re pretty happy with this architecture since it smooths out future forays into other platforms: we can just drop in the Go build agent and be on our merry way.

Communication with Frontend

When the frontend needs to talk to the backend, it does so via a dedicated tier of API hosts. These API hosts are also managed by Kubernetes, albeit in a separate cluster to increase isolation. Nearly all our APIs are public, which means we’re using the same interfaces available to our customers. The value of dogfooding your APIs can’t be overstated: it’s enabled us to keep the APIs clean and spot errors before our users find them.

If you’re interacting with our web application, then all of your requests are hitting the API hosts. The majority of our authentication is handled via OAuth from GitHub or Bitbucket. Once you’ve authenticated, you can also generate an API token to get programmatic access to everything we expose in the UI.

Our API hosts once accepted webhooks from GitHub and Bitbucket, but we’ve recently extracted that into its own service. Using a cleanly-separated service that dumps hooks into RabbitMQ allows us to more easily respond to a large array of operational issues. When version control system (VCS) providers are recovering from their own issues, we’ve seen massive spikes in hooks. Now we’re well equipped to deal with that.

Data! Data! Data!

Our primary datastore is MongoDB. We made this decision in CircleCI’s early days — lured like so many others by the simplicity of “schemaless” storage and rapid iteration. Having peaked at over 10TB of bloated storage in MMAP, along with painful, outage-inducing DB-level locks in Mongo 2.4, we’re happy to see progress being made in WiredTiger. Our operations have greatly improved, but we’re still suffering from a legacy of poorly-enforced schemas on a dataset too large to clean efficiently.

So we’re retreating to the structure of PostgreSQL. We’ve got a great opportunity for this migration as we build microservices with their own datastores. We’re also using Redis to cache data we’d never store permanently, as well as to rate-limit our requests to partners’ APIs (like GitHub).

When we’re dealing with large blobs of immutable data (logs, artifacts, and test results), we store them in Amazon S3. We’re well beyond the scale where we could just dump this kind of stuff in a database. We handle any side-effects of S3’s eventual consistency model within our code to ensure that we deal with user requests correctly while writes are in process.

A Build is Born

When we process a webhook from GitHub/Bitbucket telling us that a user pushed some new code, we use the information to create a new build or workflow representation in our datastores, then queue it for processing. In order to get promoted out of this first queue, the organization needs to have enough capacity in its plan to run the build/workflow.

If you’re a customer using all your containers, no new builds or workflows are runnable until enough containers free up. When that happens, we’ll pass the definition of the work to be performed to Nomad, which is responsible for allocating hardware for the work’s duration.

Running the Build

The gritty details of processing a build are executed by the creatively named build agent. It parses configuration, executes commands, and synthesizes actions that create artifacts and test results. Most builds run in a Docker container, or set of containers, which is defined by the customer for a completely tailored build environment.

CircleCI Screenshot

The build agent streams the results of its work over gRPC to the output processor, a secure facade that understands how to write to all our internal systems. This facade approach allows our 1.0 and 2.0 platforms to coexist.

In order to get this live streaming data to your browser, we use WebSockets managed by Pusher. We also use this channel to deliver state change notifications to the browser, e.g. when a build completes. We also store small segments temporarily in Redis while we collect enough to write permanently to S3.

A Hubot Postscript

We have added very little to the CoffeeScript Hubot application – just enough to allow it to talk to our Hubot workers. The hubot workers implement our operational management functionality and expose it to Hubot so we can get chat integration for free. We’ve also tailored the authentication and authorization code of Hubot to meet the needs of roles within our team.

For larger tasks, we’ve got an internal CLI written in Go that talks to the same API as Hubot, giving access to the same functionality we have in Slack, with the addition of scripting, piping, and all of our favorite Unix tools. When the Hubot worker recognizes the CLI is in use, it logs the commands to Slack to maintain visibility of operational changes.

Analytics & Monitoring

Our primary source of monitoring and alerting is Datadog. We’ve got prebuilt dashboards for every scenario and integration with PagerDuty to manage routing any alerts. We’ve definitely scaled past the point where managing dashboards is easy, but we haven’t had time to invest in figuring out their more anomalous features. Nor the willingness to trust that it will just work for us. We capture any unhandled exceptions with Rollbar and, if we realize one will keep happening, we quickly convert the metrics to point back to Datadog, to keep Rollbar as clean as possible. We’re also using LaunchDarkly to safely deploy new and/or incomplete features behind feature flags.

We use Segment to consolidate all of our trackers, the most important of which goes to Amplitude to analyze user patterns. However, if we need a more consolidated view, we push all of our data to our own data warehouse running Postgres; this is available for rapid analytics and dashboard creation through Looker. Many engineers who want to do their own analysis use tools they’re comfortable with, which includes sed and awk but also Pandas and R.


One of the great things about being a CI/CD company is that we get to practice what we preach. Instead of long dry spells between releases, we push several changes per day to keep our feedback loops short and our codebase clean. We’re small enough that we can move quickly, but large enough that our teams have the resources they need.

This is our stack today. As our customers deal with more complex problems, we’ll adapt and adopt new tools to deal with emerging tech. It’s all very exciting, and we can’t wait to see what the future holds.

While we wait for the future, though, there’s no reason you should be waiting for good code. Start building on CircleCI today and ship your code faster. Or come work with us and help us ship our own code faster.

P.S. If you're already a CircleCI customer, head over to our community site and share your stack to get some free swag.

CircleCI Workflow

CircleCI’s continuous integration and delivery platform helps software teams rapidly release code with confidence by automating the build, test, and deploy process. CircleCI offers a modern software development platform that lets teams ramp quickly, scale easily, and build confidently every day.
Tools mentioned in article
Open jobs at CircleCI
Support Engineer - Mobile
North America

As a Support Engineer, you are responsible for providing world-class post-sales technical leadership to our client base. Working directly with customers, the authority on both the CircleCI platform and continuous integration and deployment as a general practice will fall on you. Finally, as part of the Customer Engineering organization, you will have the opportunity to work directly with our Product Management, Engineering, Customer Success, and Marketing teams to share your knowledge and experiences, and will act as the Voice of the Customer to help drive improvements that will ensure our customers’ success with CircleCI.

You will be the main point of contact for technical questions and requests for assistance that our customers register in our support ticketing system. You will work with the rest of the Support team to build out and cultivate our support operations, and help to establish a customer community to enhance the experience of our customers. For this job you will need to have a strong technical capability along with a strong self-starting, dedicated mentality and the ability to maintain empathy for customers. You’re going to be dealing with very technical users and complex issues, but you’re also tasked with creating excitement and loyalty in the customers you interact with. More specifically, we are looking for someone with experience supporting mobile development, specifically for the Apple iOS and Android platforms. We are seeking someone who has experience building and resolving mobile development initiatives.

About CircleCI

CircleCI is the best platform for software teams looking to rapidly build quality projects, at scale. Our intelligent continuous integration and delivery tools are simple yet powerful. Our aim is to provide the wisdom of a connected development ecosystem to every team member making technology decisions.

We run 7M+ builds a month on our platform for companies like Spotify, Kickstarter, Sony, and Coinbase. Over 25,000 organizations and 300,000 developers actively build, test, and deploy on CircleCI. We’ve raised $59.5M in venture capital from Industry Ventures, Top Tier Capital, Scale Venture Partners, DFJ, Harrison Metal Capital, and Baseline Ventures.

About Customer Engineering at CircleCI

CircleCI’s Customer Engineering organization’s goal is to make life easier for our customers and leave them with the “wow” experience of building and testing their applications with ease. Customer Engineering works with customers to understand their technical and business needs and requirements—from onboarding to implementation to scale. This Department currently comprise Solutions Engineering (pre-sales), Success Engineering ( post-sales for large accounts), and Support Engineering (post-sales ticket based support for all customers). There is an opportunity for both horizontal and vertical growth and promotion within Solutions, as well, as new Solutions teams are built out going forward, including Community Engineering and Partner Engineering.




If you’re interested in joining the team, please send us your resume and a cover letter explaining why you’d be a great fit. If you have an easily accessible presence on the web (Twitter, blog, GitHub, LinkedIn, etc.), please share it.

We care deeply about diversity and inclusivity. We’re hiring at all experience levels, and seek talented teammates from a wide variety of backgrounds and experiences who are equally committed to cultivating a work environment of respect and kindness. We carefully consider every applicant that takes the time to apply.

Verified by
Director of Marketing
Software Engineer
Staff Software Engineer
Support Engineer
Developer Evangelist
Developer Advocate
You may also like
Building a Kubernetes Platform at Pinterest
Rust at OneSignal
How to Practically Use Performance API to Measure Performance
Nine Experimentation Best Practices