How We Designed Our Continuous Integration System to be More Than 50% Faster

2,970
Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.

By Urvashi Reddy | Software Engineer, Engineering Productivity Team


Earlier this year, the Engineering Productivity team at Pinterest published a blog called How a one line change decreased our clone times by 99%. In that post, we described how a simple Git configuration sped up clone times in one of our largest repositories at Pinterest. In this post, we’ll talk about how we significantly decreased our build times in a CI that serves another major repository at Pinterest. Spoiler alert: it was not a one line change!

The content covered in this blog was presented at BazelCon 2020. Check out the presentation Designing a Language Agnostic CI with Bazel Queries.

The Engineering Productivity team’s vision is to “build a developer platform that inspires developers to do their best work.” One of the integral pieces of this platform is the Continuous Integration (CI) pipelines. The CI pipelines are responsible for validating code changes and producing release artifacts that can be deployed to one of our supported Continuous Delivery platforms. With over 1,000 engineers at the company, our team is faced with an interesting challenge of providing reliable and fast CI pipelines that serve the major repositories at scale.

In order to meet those outcomes, our team made a few key design choices:

  • Adopt Bazel as our build tool
  • Use language based monorepos
  • Test and release only the changed code
  • Leverage Bazel’s BUILD file as a contract between CI and developers
  • Create release abstractions with custom Bazel rules
  • Parallelize work as much as possible

Adopting Bazel

Choosing a single build tool that is multi-language allows our team to create CI workflows that can be applied to any repository using Bazel at Pinterest. Since Bazel is hermetic by design, we can run Bazel targets in CI without needing to configure or manage dependencies on the host machines.

Language based monorepos

Our CI pipelines are triggered when code is committed to a repository, which means having a CI pipeline for every repository. In order to limit the number of CI pipelines and repositories we have to manage, we group our services into one repository per language.

If you’re interested in learning more about the above two choices, check out another BazelCon 2020 presentation from our team called Pinterest’s Journey to a Monorepo.

Test and release only changed code

At scale, running every target within a repository is expensive. Even with Bazel’s cache, running all test targets with bazel test… still means spending time fetching and loading dependencies for targets. Those network calls to the cache are time consuming and can be avoided entirely if only the minimal set of targets are run.

Additionally, the services within our monorepos vary in commit frequency. Some of them are contributed to daily while others are more sporadic. By running in CI only what’s affected by a change, we can significantly speed up our build times.

So how do we get the minimal set of targets to run? We created a Golang CLI called build-collector to report the targets to run in CI. The CLI takes in a set of commits and uses Bazel queries to output the list of targets to run. The CLI looks at the files that were changed and runs the appropriate query to find the affected targets. For example, if a couple source code files were changed build-collector would run the following query:

The above command uses the rdeps query function to find the reverse dependencies of the source files. The output is a list of targets we can run in CI. In order to get test targets specifically, build-collector wraps the above with a filter using the kind query function:

Note: Alternatively, the tests query function can be used to filter for test targets

This is just one type of query that build-collector runs. The full list of queries are explained in the Designing a Language Agnostic CI with Bazel Queries presentation. At this point, you might be wondering how we get release targets. We’ll cover that further below when we talk about our custom release rule implementation.

In our CI script, we call build-collector in the following way:

Here’s an example of build-collector’s output file for test targets. The same JSON schema is used for release targets as well.

The CI script parses the JSON files outputted by build-collector and runs the targets in parallel across multiple workers.

Leverage Bazel’s BUILD file as a contract between CI and developers

We want Pinterest engineers to focus on feature work and not have to learn too many things about how our CI is set up. Given that we also want to run the minimal set of targets in CI, we need a way for developers to communicate which targets are for local development and what parts of their service should be tested and released in CI. This is where the Bazel BUILD file comes in handy since it is already the place that developers are defining tests and release artifacts pertaining to their service. Developers follow a few simple conventions in the BUILD file so that our CI can figure out exactly what to build.

Those conventions are:

  • Use test rule types for running tests (standard Bazel practice)
  • Use Pinterest custom release rules for generating release artifacts
  • Use supported tags like no-ci to indicate what should not run in the pipelines

Create release abstractions with custom Bazel rules

At Pinterest, we support a number of different artifact types that are released to various stores. For example, docker images are sent to an internal registry, jars are published to Artifactory, etc. To support these workflows, we implemented custom Bazel rules for the common use cases at Pinterest. The custom rules help us to create an abstraction over the infrastructure. All developers need to do is indicate what they want to publish within the BUILD file using our custom rules.

A common workflow is creating and publishing docker images that are then referenced when deploying to EC2 or Kubernetes. Below is an example of how engineers can use a custom Bazel rule called container_release to get CI to make their release artifacts available for deployment.

In this example BUILD file, this service has created a docker image using the open source container_image rule. Using the custom container_release rule, the service author can publish the docker image to a Pinterest registry as well as specify what deployment artifacts should be made available to our Continuous Delivery platforms (Teletraan for EC2 deployments and Hermez for Kubernetes workloads).

There are a few benefits we get from implementing custom release rules:

  • It abstracts the infrastructure layers for our developers. They don’t have to be aware of where the deployment artifacts end up and how they are consumed by our CD platforms.
  • Developers control what parts of their service are released via version controlled code
  • We can support dev versions of their release artifacts by controlling where the artifacts are released.

That last benefit is made possible with another release abstraction within the custom release rule implementation. Each Pinterest custom release rule is actually a Bazel macro that generates two custom release rules: artifact_ci_release and artifact_dev_release. Our developers don’t see or interact with these rules directly, but they are used by our CI and local development workflows to ensure that they are run in the right context. For instance, below is the Bazel query build-collector runs to obtain release targets for source code changes:

A further optimization we made here was to control the dependency order that the release artifacts are run in within the artifact_*_release implementation. For instance, the docker images are published to the registry before we publish the YAML files that reference them. Doing it this way made the querying logic in build-collector fast and straightforward.

Parallelize as much as possible

In order to make sure our CI is running as fast as possible, we want to parallelize running targets wherever possible. We currently achieve this with what we call the Dispatcher Model. The Dispatcher Model is pretty simple: we figure out what targets need to run in CI and dispatch the execution of those targets to workers that run in parallel.

This has a significant benefit when running release targets. If a developer only cares about releasing artifacts from a few services they contribute to, they shouldn’t have to wait for all the other services in the CI build to be finished. Running release targets independently and in parallel provides developers with their release artifacts as soon as they are ready.

What about Pull Request (PR) builds?

Our PR pipeline kicks off a CI run every time a new pull request is created. We patch the code changes and use a temporary commit to pass to build-collector. This allows us to easily reuse the same CI setup for PR builds as well. The only difference is that we don’t create any release artifacts and instead check that the release targets can compile by running Bazel build.

The Results

At the beginning of this year, we invested in the above design choices in repo called Optimus. Optimus is a monorepo that houses 120+ java services and holds some of our most critical data platforms at Pinterest. Optimus and its CI pipeline serves 300 monthly active contributors.

At the beginning of this year, we didn’t use build-collector and weren’t using the release rule abstractions in Optimus. In place of those things, we were running all the targets within a service when code changes were made, and we had granular release rules for releasing release artifacts. At that time, the P50 time for the CI build was 52mins and the P90 time was 69mins. After migrating to our new CI design, we saw the P50 time drop to 19mins and P90 went to 49mins within a week. That’s 2.7x faster for P50 and 1.4x for P90!

Chart comparing build times the week before and after the CI migration

One month distribution of build times with the old CI

One month distribution of build times with new CI

Learnings

Bazel is a powerful tool we can leverage to create a Continuous Integration Platform that works for a variety of use cases at Pinterest. Optimizations like build-collector and the custom release rules build the foundational layer of the platform from which we can create more enhancements that will further improve the velocity and health of our CI builds. Some of the things we’d like to look into next with Bazel are: remote execution, profiling, and a system for automatically detecting and excluding flaky tests.

Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.
Tools mentioned in article
Open jobs at Pinterest
Software Engineer, Continuous Delivery
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

The Engineering Productivity team is a small and diverse group of experienced engineers who build products, tools and systems enabling thousands of Pinterest engineers to be productive across the development lifecycle (code, build, deploy). Our vision is to build a platform that inspires developers to do their best work, and our mission is to provide a fast and safe path from prototype to production without worrying about the underlying infrastructure. We believe in contributing to open source work and collaboration as much as possible. 

What you’ll do:

  • Work on our next-generation, continuous delivery platform for multiple compute backends (VM, K8S, Yarn)
  • Build partnerships with various teams (our customers) by learning their use-cases and pain points; champion key efforts across teams.
  • Have an opportunity to impact key open source systems such as Teletraan (Deploy/Compute Engine), Kubernetes (Compute Engine), Spinnaker (Workflow Orchestration).

What we’re looking for:

  • 4+ years experience as a software engineer and have a penchant to build and scale critical distributed systems.
  • Proficient in coding and scripting languages such as Go, Java, NodeJS or Python.
  • You enjoy articulating technical details on key problem areas by writing RFCs and design documents.

#LI-SG1

Engineering Manager, Creator Platform
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

The Core Eng - Create Team’s mission is to empower people to create content and build communities that inspire others to create a life they love.

Our focus is to:

  • Grow our quality native content corpus, so we can enable more immersive and engaging experiences on Pinterest.
  • Build delightful tools for anyone who wants to create inspiring content, then build sustained value in our publishing ecosystem - whether through a positive, growing community for the creator or monetization.
  • Protect our pinners and creators by maintaining a positive community and the trust and safety of our pinners.

What you'll do:

  • Partner closely with Design, Product Management, and other Engineering leaders within Core Eng and across Pinterest to identify the highest-impact opportunities to accelerate the product and user experience.
  • Take ownership of critical outcomes. Manage a cross-platform (Android/iOS/Web/Backend) engineering team through the process of planning and executing against those outcomes, delivering software experiences and solutions of the highest quality.
  • Grow the team's capabilities through career development of the current engineers and strategic hiring.

What we're looking for:

  • Passion for developing software solutions for the people using Pinterest.
  • Ability to deliver on immediate goals and form long-term strategies around technology, processes, and people.
  • Strong people development skills and software project management skills.
  • Proven success launching complex software development projects that deliver high-impact outcomes.

#LI-AG1

Design Technologist - Figma Plugin De...
, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Pinterest is looking for a Javascript developer to help build out our Figma tooling solutions to better support the company’s design team. You will be creating the tools which help our designers improve Pinterest so it can live up to its mission to bring inspiration and a positive impact to people’s lives.
 
What you’ll do:
  • Talk with product designers to gain a direct understanding of how they work and how our plugins can better support their workflow.
  • Work directly with the product design team to craft and ship tools that will help our team work better and with greater velocity.
  • Develop a strong baseline/framework for all future Figma plugin solutions within Pinterest.
What we’re looking for:
  • 5+ years of experience building on the web platform.
  • Strong background in current web app development practices as well as a strong familiarity with Javascript, Typescript and Webpack.
  • Solid experience with HTML and CSS fundamentals.
  • Background and familiarity with modern design processes and tools is a big plus.
  • Experience with React and using Figma’s plugin/REST APIs a big plus.

More about contract roles at Pinterest: 

  • This is a contract position at Pinterest. As such, the contractor who fills this role will be employed either by our staffing partner (ProUnlimited) or by an agency partner, and not an employee of Pinterest
  • All interviews will be scheduled and/or conducted by the Pinterest assignment manager. When a finalist has been selected, ProUnlimited or the agency partner will extend the offer and provide assignment details including duration, benefits options and onboarding details

#LI-AZ415

#LI-REMOTE

Engineering Manager, Stream Processin...
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

As the manager for the Stream Processing Platform team at Pinterest, you will lead a team of talented engineers to work on the large scale platform that powers real-time stream processing applications that process trillions of messages every day. You will have the opportunity to work with the team on the latest cutting edge real-time stream processing challenges, defining the future strategy and taking the platform to the next level. You will also have the opportunity to work closely with product teams that develop exciting new applications on our platform every day, from machine learning, analytics, trust safety to shopping. 

What you’ll do:

  • Lead the team that owns the entire stack of stream processing platform
  • Drive Pinterest’s stream processing strategy and vision
  • Collaborate with customers to understand requirements and incorporate them in the roadmap and work closely with partners to align on strategic directions
  • Hire and further build up the team to support more use cases

What we’re looking for:

  • 7+ years of experience, including 2+ years of management experience
  • Solid expertise in big data or other types of large scale distributed systems
  • Platform development and operational experience
  • Enjoying working in an agile environment

#LI-MJ1

Verified by
Security Software Engineer
Tech Lead, Big Data Platform
Software Engineer
Talent Brand Manager
Sourcer
Software Engineer
You may also like