How We Designed Our Continuous Integration System to be More Than 50% Faster

1,121
Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.

By Urvashi Reddy | Software Engineer, Engineering Productivity Team


Earlier this year, the Engineering Productivity team at Pinterest published a blog called How a one line change decreased our clone times by 99%. In that post, we described how a simple Git configuration sped up clone times in one of our largest repositories at Pinterest. In this post, we’ll talk about how we significantly decreased our build times in a CI that serves another major repository at Pinterest. Spoiler alert: it was not a one line change!

The content covered in this blog was presented at BazelCon 2020. Check out the presentation Designing a Language Agnostic CI with Bazel Queries.

The Engineering Productivity team’s vision is to “build a developer platform that inspires developers to do their best work.” One of the integral pieces of this platform is the Continuous Integration (CI) pipelines. The CI pipelines are responsible for validating code changes and producing release artifacts that can be deployed to one of our supported Continuous Delivery platforms. With over 1,000 engineers at the company, our team is faced with an interesting challenge of providing reliable and fast CI pipelines that serve the major repositories at scale.

In order to meet those outcomes, our team made a few key design choices:

  • Adopt Bazel as our build tool
  • Use language based monorepos
  • Test and release only the changed code
  • Leverage Bazel’s BUILD file as a contract between CI and developers
  • Create release abstractions with custom Bazel rules
  • Parallelize work as much as possible

Adopting Bazel

Choosing a single build tool that is multi-language allows our team to create CI workflows that can be applied to any repository using Bazel at Pinterest. Since Bazel is hermetic by design, we can run Bazel targets in CI without needing to configure or manage dependencies on the host machines.

Language based monorepos

Our CI pipelines are triggered when code is committed to a repository, which means having a CI pipeline for every repository. In order to limit the number of CI pipelines and repositories we have to manage, we group our services into one repository per language.

If you’re interested in learning more about the above two choices, check out another BazelCon 2020 presentation from our team called Pinterest’s Journey to a Monorepo.

Test and release only changed code

At scale, running every target within a repository is expensive. Even with Bazel’s cache, running all test targets with bazel test… still means spending time fetching and loading dependencies for targets. Those network calls to the cache are time consuming and can be avoided entirely if only the minimal set of targets are run.

Additionally, the services within our monorepos vary in commit frequency. Some of them are contributed to daily while others are more sporadic. By running in CI only what’s affected by a change, we can significantly speed up our build times.

So how do we get the minimal set of targets to run? We created a Golang CLI called build-collector to report the targets to run in CI. The CLI takes in a set of commits and uses Bazel queries to output the list of targets to run. The CLI looks at the files that were changed and runs the appropriate query to find the affected targets. For example, if a couple source code files were changed build-collector would run the following query:

The above command uses the rdeps query function to find the reverse dependencies of the source files. The output is a list of targets we can run in CI. In order to get test targets specifically, build-collector wraps the above with a filter using the kind query function:

Note: Alternatively, the tests query function can be used to filter for test targets

This is just one type of query that build-collector runs. The full list of queries are explained in the Designing a Language Agnostic CI with Bazel Queries presentation. At this point, you might be wondering how we get release targets. We’ll cover that further below when we talk about our custom release rule implementation.

In our CI script, we call build-collector in the following way:

Here’s an example of build-collector’s output file for test targets. The same JSON schema is used for release targets as well.

The CI script parses the JSON files outputted by build-collector and runs the targets in parallel across multiple workers.

Leverage Bazel’s BUILD file as a contract between CI and developers

We want Pinterest engineers to focus on feature work and not have to learn too many things about how our CI is set up. Given that we also want to run the minimal set of targets in CI, we need a way for developers to communicate which targets are for local development and what parts of their service should be tested and released in CI. This is where the Bazel BUILD file comes in handy since it is already the place that developers are defining tests and release artifacts pertaining to their service. Developers follow a few simple conventions in the BUILD file so that our CI can figure out exactly what to build.

Those conventions are:

  • Use test rule types for running tests (standard Bazel practice)
  • Use Pinterest custom release rules for generating release artifacts
  • Use supported tags like no-ci to indicate what should not run in the pipelines

Create release abstractions with custom Bazel rules

At Pinterest, we support a number of different artifact types that are released to various stores. For example, docker images are sent to an internal registry, jars are published to Artifactory, etc. To support these workflows, we implemented custom Bazel rules for the common use cases at Pinterest. The custom rules help us to create an abstraction over the infrastructure. All developers need to do is indicate what they want to publish within the BUILD file using our custom rules.

A common workflow is creating and publishing docker images that are then referenced when deploying to EC2 or Kubernetes. Below is an example of how engineers can use a custom Bazel rule called container_release to get CI to make their release artifacts available for deployment.

In this example BUILD file, this service has created a docker image using the open source container_image rule. Using the custom container_release rule, the service author can publish the docker image to a Pinterest registry as well as specify what deployment artifacts should be made available to our Continuous Delivery platforms (Teletraan for EC2 deployments and Hermez for Kubernetes workloads).

There are a few benefits we get from implementing custom release rules:

  • It abstracts the infrastructure layers for our developers. They don’t have to be aware of where the deployment artifacts end up and how they are consumed by our CD platforms.
  • Developers control what parts of their service are released via version controlled code
  • We can support dev versions of their release artifacts by controlling where the artifacts are released.

That last benefit is made possible with another release abstraction within the custom release rule implementation. Each Pinterest custom release rule is actually a Bazel macro that generates two custom release rules: artifact_ci_release and artifact_dev_release. Our developers don’t see or interact with these rules directly, but they are used by our CI and local development workflows to ensure that they are run in the right context. For instance, below is the Bazel query build-collector runs to obtain release targets for source code changes:

A further optimization we made here was to control the dependency order that the release artifacts are run in within the artifact_*_release implementation. For instance, the docker images are published to the registry before we publish the YAML files that reference them. Doing it this way made the querying logic in build-collector fast and straightforward.

Parallelize as much as possible

In order to make sure our CI is running as fast as possible, we want to parallelize running targets wherever possible. We currently achieve this with what we call the Dispatcher Model. The Dispatcher Model is pretty simple: we figure out what targets need to run in CI and dispatch the execution of those targets to workers that run in parallel.

This has a significant benefit when running release targets. If a developer only cares about releasing artifacts from a few services they contribute to, they shouldn’t have to wait for all the other services in the CI build to be finished. Running release targets independently and in parallel provides developers with their release artifacts as soon as they are ready.

What about Pull Request (PR) builds?

Our PR pipeline kicks off a CI run every time a new pull request is created. We patch the code changes and use a temporary commit to pass to build-collector. This allows us to easily reuse the same CI setup for PR builds as well. The only difference is that we don’t create any release artifacts and instead check that the release targets can compile by running Bazel build.

The Results

At the beginning of this year, we invested in the above design choices in repo called Optimus. Optimus is a monorepo that houses 120+ java services and holds some of our most critical data platforms at Pinterest. Optimus and its CI pipeline serves 300 monthly active contributors.

At the beginning of this year, we didn’t use build-collector and weren’t using the release rule abstractions in Optimus. In place of those things, we were running all the targets within a service when code changes were made, and we had granular release rules for releasing release artifacts. At that time, the P50 time for the CI build was 52mins and the P90 time was 69mins. After migrating to our new CI design, we saw the P50 time drop to 19mins and P90 went to 49mins within a week. That’s 2.7x faster for P50 and 1.4x for P90!

Chart comparing build times the week before and after the CI migration

One month distribution of build times with the old CI

One month distribution of build times with new CI

Learnings

Bazel is a powerful tool we can leverage to create a Continuous Integration Platform that works for a variety of use cases at Pinterest. Optimizations like build-collector and the custom release rules build the foundational layer of the platform from which we can create more enhancements that will further improve the velocity and health of our CI builds. Some of the things we’d like to look into next with Bazel are: remote execution, profiling, and a system for automatically detecting and excluding flaky tests.

Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.
Tools mentioned in article
Open jobs at Pinterest
Machine Learning Engineer, Home Team
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Homefeed is a discovery platform at Pinterest that helps users find and explore their personal interests. We work with some of the largest datasets in the world, tailoring over billions of unique content to 330M+ users. Our content ranges across all categories like home decor, fashion, food, DIY, technology, travel, automotive, and much more. Our dataset is rich with textual and visual content and has nice graph properties — harnessing these signals at scale is a significant challenge. The Homefeed ranking team focuses on the machine learning model that predicts how likely a user will interact with a certain piece of content, as well as leveraging those individual prediction scores for holistic optimization to present users with a feed of diverse content.

What you’ll do:

  • Work on state-of-the-art large-scale applied machine learning projects
  • Improve relevance and the user experience on Homefeed
  • Re-architect our deep learning models to improve their capacity and enable more use cases
  • Collaborate with other teams to build/incorporate various signals to machine learning models
  • Collaborate with other teams to extend our machine learning based solutions to other use cases

What we’re looking for:

  • Passionate about applied machine learning and deep learning
  • 8+ years experience applying machine learning methods in settings like recommender systems, search, user modeling, image recognition, graph representation learning, natural language processing

#L1-EA2

EPM Solutions Architect, Adaptive Pla...
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

The EPM technology team at Pinterest is looking for a senior EPM architect who has at least four years of technical experience in Workday Adaptive Planning. You will be the solutions architect who oversees technical design of the complete EPM ecosystem with emphasis on Adaptive Financial and Workforce planning. The right candidate will also need to have hands-on development experience with Adaptive Planning and related technologies. The role is in IT but will work very closely with FP&A and the greater Finance/Accounting teams. Experience with Tableau suite of tools is a plus.

What you'll do: 

  • Together with the EPM Technology team, you will own Adaptive Planning and all related services
  • Oversee architecture of existing Adaptive Planning solution and make suggestions for improvements
  • Solution and lead Adaptive Planning enhancement projects from beginning to end
  • Help EPM Technology team gain deeper understanding of Adaptive Planning and train the team on Adaptive Planning best practices
  • Establish strong relationship with Finance users and leadership to drive EPM roadmap for Adaptive Planning and related technologies
  • Help establish EPM Center of Excellence at Pinterest
  • This is a contract position at Pinterest. As such, the contractor who fills this role will be employed either by our staffing partner (ProUnlimited) or by an agency partner, and not an employee of Pinterest.
  • All interviews will be scheduled and/or conducted by the Pinterest assignment manager. When a finalist has been selected, ProUnlimited or the agency partner will extend the offer and provide assignment details including duration, benefits options and onboarding details.

What we're looking for: 

  • Hands-on design and build experience with all Adaptive Planning technologies: standard sheets, cube sheets, all dimensions, reporting, integration framework, security, dashboarding and OfficeConnect
  • Strong in application design, data integration and application project lifecycle
  • Comfortable working side-by-side with business
  • Ability to translate business requirements to technical requirements
  • Strong understanding in all three financial statements and the different enterprise planning cycles
  • Familiar with Tableau suite of tools.

 

More about contract roles at Pinterest: 

  • This is temporary contract position at Pinterest. Contractors at Pinterest are employed through our agency partners. If hired, your employer of record will be the agency that submitted your resume to the role. 
  • Interviews will be scheduled through the agency you’re working with, and conducted by the Pinterest hiring manager. When the hiring manager is ready to move forward with a candidate the agency will reach out to the candidate to extend the offer and provide assignment details (duration, rate, benefits, etc.).

 

Machine Learning Engineer, Content Si...
Toronto, ON, CA

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

On the Content Signals team, you’ll be responsible for building machine learning signals from NLP and CV components to productionizing the end product in batch and real-time setting at Pinterest scale. Our systems offer rich semantics to the recommendation platform and enable the product engineers to build deeper experiences to further engage Pinners. In understanding structured and unstructured content, we leverage embeddings, supervised and semi-supervised learning, and LSH. To scale our systems we leverage Spark, Flink, and low-latency model serving infrastructure.

What you’ll do:

  • Apply machine learning approaches to build rich signals that enable ranking and product engineers to build deeper experiences to further engage Pinners
  • Own, improve, and scale signals over both structured and unstructured content that bring tens of millions of rich content to Pinterest each day
  • Drive the roadmap for next-generation content signals that improve the content ecosystem at Pinterest.

What we’re looking for:

  • Deep expertise in content modeling at consumer Internet scale
  • Strong ability to work cross-functionally and with partner engineering teams
  • Expert in Java, Scala or Python.

#LI-EA2

Senior Software Engineer, Shopping Co...
Toronto, ON, CA

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Pinterest is aiming to build a world-class shopping experience for our users, and has a unique advantage to succeed due to the high shopping intent of Pinners. The new Shopping Content Mining team being founded in Toronto plays a critical role in this journey. This team is responsible for building a brand new platform for mining and understanding product data, including extracting high quality product attributes from web pages and free texts that come from all major retailers across the world, mining product reviews and product relationships, product classification, etc. The rich product data generated by this platform is the foundation of the unified product catalog, which powers all shopping experiences at Pinterest (e.g., product search & recommendations, product detail page, shop the look, shopping ads).

There are unique technical challenges for this team: building large scale systems that can process billions of products, Machine Learning models that require few training examples to generate wrappers for web pages, NLP models that can extract information from free-texts, easy-to-use human labelling tools that generate high quality labeled data. Your work will have a huge impact on improving the shopping experience of 400M+ Pinners and driving revenue growth for Pinterest.

What you’ll do:

  • As a backend engineer, design and build large scale systems that can process billions of products, e.g., information extraction systems using XML parsers.
  • Design and build systems / tools, e.g.,
    • UI for data labeling and ML model diagnostic
    • feature extraction for ML models
    • extraction template / model fast deployment
    • evaluation / outlier detection system for data quality
  • Drive cross functional collaborations with partner teams working on shopping

What we’re looking for:

  • 5+ years of industry experience
  • Expert in Python and Java
  • Hands-on experience with big data technologies (e.g., Hadoop/Spark) and scalable realtime systems that process stream data
  • Nice to have: Familiarity with information extraction techniques for web-pages and free-texts, Experience working with shopping data is a plus, Experience building internal tools for labeling / diagnosing, Basic knowledge of machine learning (or willing to learn!): feature extraction, training, etc

#LI-EA1

Verified by
Security Software Engineer
Tech Lead, Big Data Platform
Software Engineer
Talent Brand Manager
Sourcer
Software Engineer
You may also like
Pinterest Flink Deployment Framework
Manas Realtime — Enabling Changes to Be Searchable in a Blink of an Eye
Scaling Cache Infrastructure at Pinterest
Building Component Based Apps