How a One Line Change Decreased Our Clone Times by 99%

1,096
Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.

By Urvashi Reddy | Software Engineer, Engineering Productivity Team Adam Berry | Tech Lead, Engineering Productivity Team Rui Li | Software Engineer, Engineering Productivity Team


The Engineering Productivity team at Pinterest came across a small change that had a large impact in reducing build times across pipelines. We found that setting the refspec option during git fetch reduced our cloning step by 99%.

The Engineering Productivity team at Pinterest is responsible for supporting the engineers who build and deploy software at the company. Our team maintains a number of infrastructure services and often works on large scale efforts — migrating all software at Pinterest to Bazel, creating a Continuous Delivery platform called Hermez, and maintaining the monorepos that get committed to a few hundred times a day, to name a few.

All our larger efforts are designed to progress on the goal of making software development and delivery at Pinterest a fast and painless experience. Recently, we were reminded of how small details can also have a large impact on that goal. We discovered an overlooked Git option that significantly reduced the build times in our continuous integration pipelines. In order to understand how this small change had such a large impact, we need to share a little information on our monorepos and our pipelines.

Monorepos and Pipelines

We have six main repositories at Pinterest: Pinboard, Optimus, Cosmos, Magnus, iOS, and Android. Each one is a monorepo and houses a large collection of language-specific services. Pinboard is as old as the company and is the largest monorepo. Pinboard has more than 350K commits and is 20GB in size when cloned fully.

Cloning monorepos that have a lot of code and history is time consuming, and we need to do it frequently throughout the day in our continuous integration pipelines. For Pinboard alone, we do more than 60K git pulls on business days. Most of our Jenkins pipeline configuration scripts (written in Groovy) start with a “Checkout” stage where we clone the repository that the later stages will build and test. Here’s what a typical “Checkout” stage looks like:

If we were using the Git CLI directly, it would translate to:

Even though we’re telling Git to do a shallow clone, to not fetch any tags, and to fetch the last 50 commits, we’re still not running this operation as fast as we could be. That’s because we aren’t setting the refspec option. Notice that by not setting it in the pipeline configuration script, we’re telling Git to fetch all refspecs: +refs/heads/*:refs/remotes/origin/*. In the case of Pinboard, that operation would be fetching more than 2,500 branches.

By simply adding the refspec option and specifying which refs we care about (master, in our case), we can limit the refs to the branch we care about and save a lot of time. Here’s what that looks like in our pipeline code:

This simple one line change reduced our clone times by 99% and significantly reduced our build times as a result. Cloning our largest repo, Pinboard went from 40 minutes to 30 seconds. It goes to show that sometimes our small efforts can have a big impact as well.

Learnings

Like most developer productivity teams, we work on large services that have a big impact on our day-to-day experience. However, it’s sometimes the one line fixes that can make a huge difference. Our work is about understanding that.

If you have a passion for improving developer productivity, join our team! We’re hiring for engineering roles on our Engineering Productivity team.

To learn more about Engineering at Pinterest, check out our Engineering Blog, and visit our StackShare page. To view and apply to open opportunities, visit our Careers page.

Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.
Tools mentioned in article
Open jobs at Pinterest
Senior Staff Machine Learning Enginee...
San Francisco, CA

About Pinterest:

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. As a Pinterest employee, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping users make their lives better in the positive corner of the internet.

Homefeed is a discovery platform at Pinterest that helps users find and explore their personal interests. We work with some of the largest datasets in the world, tailoring over billions of unique content to 330M+ users. Our content ranges across all categories like home decor, fashion, food, DIY, technology, travel, automotive, and much more. Our dataset is rich with textual and visual content and has nice graph properties — harnessing these signals at scale is a significant challenge. The Homefeed ranking team focuses on the machine learning model that predicts how likely a user will interact with a certain piece of content, as well as leveraging those individual prediction scores for holistic optimization to present users with a feed of diverse content.

What you’ll do:

  • Work on state-of-the-art large-scale applied machine learning projects
  • Improve relevance and the user experience on Homefeed
  • Re-architect our deep learning models to improve their capacity and enable more use cases
  • Collaborate with other teams to build/incorporate various signals to machine learning models
  • Collaborate with other teams to extend our machine learning based solutions to other use cases

What we’re looking for:

  • Passionate about applied machine learning and deep learning
  • 8+ years experience applying machine learning methods in settings like recommender systems, search, user modeling, image recognition, graph representation learning, natural language processing

#L1-EA2

EPM Lead Developer, Adaptive Planning...
San Francisco, CA

About Pinterest: 

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. As a Pinterest employee, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping users make their lives better in the positive corner of the internet.

The EPM technology team at Pinterest is looking for a senior EPM architect who has at least four years of technical experience in Workday Adaptive Planning. You will be the solutions architect who oversees technical design of the complete EPM ecosystem with emphasis on Adaptive Financial and Workforce planning. The right candidate will also need to have hands-on development experience with Adaptive Planning and related technologies. The role is in IT but will work very closely with FP&A and the greater Finance/Accounting teams. Experience with Tableau suite of tools is a plus.

What you'll do: 

  • Together with the EPM Technology team, you will own Adaptive Planning and all related services
  • Oversee architecture of existing Adaptive Planning solution and make suggestions for improvements
  • Solution and lead Adaptive Planning enhancement projects from beginning to end
  • Help EPM Technology team gain deeper understanding of Adaptive Planning and train the team on Adaptive Planning best practices
  • Establish strong relationship with Finance users and leadership to drive EPM roadmap for Adaptive Planning and related technologies
  • Help establish EPM Center of Excellence at Pinterest
  • This is a contract position at Pinterest. As such, the contractor who fills this role will be employed either by our staffing partner (ProUnlimited) or by an agency partner, and not an employee of Pinterest.
  • All interviews will be scheduled and/or conducted by the Pinterest assignment manager. When a finalist has been selected, ProUnlimited or the agency partner will extend the offer and provide assignment details including duration, benefits options and onboarding details.

What we're looking for: 

  • Hands-on design and build experience with all Adaptive Planning technologies: standard sheets, cube sheets, all dimensions, reporting, integration framework, security, dashboarding and OfficeConnect
  • Strong in application design, data integration and application project lifecycle
  • Comfortable working side-by-side with business
  • Ability to translate business requirements to technical requirements
  • Strong understanding in all three financial statements and the different enterprise planning cycles
  • Familiar with Tableau suite of tools

 

Senior Machine Learning Engineer, Con...
Toronto, CA

About Pinterest:

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. As a Pinterest employee, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping users make their lives better in the positive corner of the internet.

On the Content Signals team, you’ll be responsible for building machine learning signals from NLP and CV components to productionizing the end product in batch and real-time setting at Pinterest scale. Our systems offer rich semantics to the recommendation platform and enable the product engineers to build deeper experiences to further engage Pinners. In understanding structured and unstructured content, we leverage embeddings, supervised and semi-supervised learning, and LSH. To scale our systems we leverage Spark, Flink, and low-latency model serving infrastructure.

What you’ll do:

  • Apply machine learning approaches to build rich signals that enable ranking and product engineers to build deeper experiences to further engage Pinners
  • Own, improve, and scale signals over both structured and unstructured content that bring tens of millions of rich content to Pinterest each day
  • Drive the roadmap for next-generation content signals that improve the content ecosystem at Pinterest.

What we’re looking for:

  • Deep expertise in content modeling at consumer Internet scale
  • Strong ability to work cross-functionally and with partner engineering teams
  • Expert in Java, Scala or Python

#LI-EA2

Senior Software Engineer, Shopping Co...
Toronto, CA

About Pinterest: 

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. As a Pinterest employee, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping users make their lives better in the positive corner of the internet.

Pinterest is aiming to build a world-class shopping experience for our users, and has a unique advantage to succeed due to the high shopping intent of Pinners. The new Shopping Content Mining team being founded in Toronto plays a critical role in this journey. This team is responsible for building a brand new platform for mining and understanding product data, including extracting high quality product attributes from web pages and free texts that come from all major retailers across the world, mining product reviews and product relationships, product classification, etc. The rich product data generated by this platform is the foundation of the unified product catalog, which powers all shopping experiences at Pinterest (e.g., product search & recommendations, product detail page, shop the look, shopping ads).

There are unique technical challenges for this team: building large scale systems that can process billions of products, Machine Learning models that require few training examples to generate wrappers for web pages, NLP models that can extract information from free-texts, easy-to-use human labelling tools that generate high quality labeled data. Your work will have a huge impact on improving the shopping experience of 400M+ Pinners and driving revenue growth for Pinterest.

What you’ll do:

  • As a backend engineer, design and build large scale systems that can process billions of products, e.g., information extraction systems using XML parsers.
  • Design and build systems / tools, e.g.,
    • UI for data labeling and ML model diagnostic
    • feature extraction for ML models
    • extraction template / model fast deployment
    • evaluation / outlier detection system for data quality
  • Drive cross functional collaborations with partner teams working on shopping

What we’re looking for:

  • 5+ years of industry experience
  • Expert in Python and Java
  • Hands-on experience with big data technologies (e.g., Hadoop/Spark) and scalable realtime systems that process stream data
  • Nice to have: Familiarity with information extraction techniques for web-pages and free-texts, Experience working with shopping data is a plus, Experience building internal tools for labeling / diagnosing, Basic knowledge of machine learning (or willing to learn!): feature extraction, training, etc.

#LI-EA1

Verified by
Security Engineer
Tech Lead, Big Data Platform
Software Engineer
Talent Brand Manager
Sourcer
Software Engineer
You may also like