Scaling Cache Infrastructure at Pinterest

2,366
Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.

By Kevin Lin | Software Engineer, Storage and Caching


Demand on Pinterest’s core infrastructure systems is accelerating faster than ever as more Pinners come to Pinterest to find inspiration. A distributed cache layer fronting many services and databases is one of our core storage systems that sits near the bottom of Pinterest’s infrastructure stack, responsible for absorbing the vast majority of backend traffic driven by this growth.

Pinterest’s distributed cache fleet spans an EC2 instance footprint consisting of thousands of machines, caching hundreds of terabytes of data served at more than 150 million requests per second at peak. This cache layer optimizes top-level performance by driving down latency across the entire backend stack and provides significant cost efficiency by reducing the capacity required for expensive backends. We’ll do a technical deep dive into the infrastructure that supports Pinterest’s cache fleet at scale.

Application Data Caching

Every API request incoming to Pinterest internally fans out to a complex tree of RPCs through the stack, hitting tens of services before completion of its critical path. This can include services for querying core data like boards and Pins, recommendation systems for serving related Pins, and spam detection systems. At many of these layers, the result of a discrete unit of work can be cached in a transient store for future reuse, as long as its input data can be described by a unique key.

At Pinterest, the most common use of the distributed cache layer is storing such results of intermediate computations with lookaside semantics. This allows the cache layer to absorb a significant share of traffic that would otherwise be destined for compute-expensive or storage-expensive services and databases. With both single-digit millisecond tail latency and an extraordinarily low infrastructure dollar cost per request served, the distributed cache layer offers a performant and cost-efficient mechanism to scale a variety of backends to meet growing Pinterest demand.

Figure 1: Highly simplified lifecycle of an API request through Pinterest’s main API service, its dependency service backends, and the distributed cache layer.

Offering a distributed cache tier as a service allows application developers to focus on implementing business logic without worrying about distributed data consistency, high availability, or memory capacity. Cache clients use a universal routing abstraction layer that ensures applications have a fault-tolerant and consistent view of data. The server fleet can additionally be scaled out independently of the application layer to transparently adjust memory or throughput capacity to accommodate changes in resource usage profiles.

The Backbone of Distributed Caching: Memcached and Mcrouter

Memcached and Mcrouter form the backbone of Pinterest’s distributed caching infrastructure and play a critical role in Pinterest’s storage infrastructure stack. Memcached is an open source, highly efficient, in-memory key-value store written in pure C. Mcrouter is a layer 7 memcached protocol proxy that sits in front of the memcached fleet and provides powerful high availability and routing features.

Memcached is an attractive choice as a caching solution:

  • Thanks in part to its asynchronous event-driven architecture and multithreaded processing model, memcached is extremely efficient and easily amenable to horizontal scaling to meet capacity demands.
  • Extstore helps realize incredible storage efficiency wins with a secondary, warm storage tier located on the instance’s NVMe flash disk.
  • Memcached’s deliberately simple architecture provides flexibility in building abstractions on top of it, as well as provides easy horizontal scalability to meet increased demand. A single memcached process by itself is a simple key-value store and deliberately has no knowledge of its peers, or even the notion of a memcached cluster.
  • Memcached has been battle-tested for accuracy and performance over decades of development and is surrounded by an active open source community (which has also accepted several Pinterest patches into upstream).
  • Memcached ships with native support for TLS termination, allowing us to secure the entire fleet with mutual TLS-authenticated traffic (with additional SPIFFE-based authorization access controls built in-house).

Mcrouter was open sourced by Facebook in 2014 and played a crucial role in scaling their memcached deployment. It fits well into Pinterest’s architecture for similar reasons:

  • Mcrouter acts as an effective abstraction of the entire memcached server fleet by providing application developers a single endpoint for interacting with the entire cache fleet. Additionally, using Mcrouter as the single interface to the system ensures universal, globally consistent traffic behavior across all services and machines at Pinterest.
  • Mcrouter presents a decoupled control plane and data plane: the entire topology of the memcached server fleet is organized into “pools” (logical clusters), while all request routing policies and behaviors dictating the interaction between clients and server pools are managed independently.
  • Mcrouter’s configuration API provides robust building blocks for complex routing behaviors, including zone-affinity routing, replication for data redundancy, multi-level cache tiers, and shadow traffic.
  • As a layer 7 proxy that speaks memcached’s ASCII protocol, mcrouter exposes intelligent protocol-specific features like request manipulation (TTL modification, in-flight compression, and more).
  • Rich observability features are provided out of the box at no cost to the client application, providing detailed visibility into memcached traffic across all of our infrastructure. Most important to us are percentile request latency, throughput sliced along individual client and server dimensions, request trends by key prefixes and key patterns, and error rates for detecting misbehaving servers.

Figure 2: Overview of request routing from mcrouter to memcached. Every key prefix is associated with a routing policy; two examples are shown.

In practice, mcrouter is deployed as a service-colocated, out-of-process proxy sidecar. As shown in Figure 2, applications (written in any language) send memcached protocol requests to mcrouter on loopback, and mcrouter proxies those requests to thousands of upstream memcached servers. This architecture allows us to build out robust features in a fully managed cache server fleet, entirely transparent to consuming services.

While memcached has been a part of the Pinterest infrastructure stack ever since the early days, our strategy around scaling its client-side counterpart has evolved significantly over the years. In particular, routing and discovery was first done in client-side libraries (which was brittle and coupled with binary deploys), followed by an in-house built routing proxy (which didn’t provide extensible building blocks for high availability), and finally mcrouter.

Compute and Storage Efficiency

Memcached is highly efficient: a single r5.2xlarge EC2 instance is capable of sustaining in excess of 100K requests per second and tens of thousands of concurrent TCP connections without tangible client-side latency degradation, making memcached Pinterest’s most throughput-efficient production service. This is due in part both to well-written C and its architecture, which makes use of multiple worker threads that independently run a libevent-driven event loop to service incoming connections.

At Pinterest, memcached’s extstore drives huge wins in storage efficiency for use cases ranging from Visual Search to personalized search recommendation engines. Extstore expands cached data capacity to a locally mounted NVMe flash disk in addition to DRAM, which increases available per-instance storage capacity from ~55 GB (r5.2xlarge) to nearly 1.7 TB (i3.2xlarge) for a fraction of the instance cost. In practice, extstore has benefitted data capacity-bound use cases without sacrificing end-to-end latency despite several orders of magnitude in difference between DRAM and SSD response times. Extstore’s built-in tuning knobs has allowed us to identify a sweet spot that balances disk I/O, disk-to-memory recache rate, compaction frequency and aggressiveness, and client-side tail response time.

High Availability

All infrastructure systems at Pinterest are highly available, and our caching systems are no exception. Leveraging rich routing features in mcrouter, our memcached fleet has a wide array of fault tolerance features:

  • Automatic failover for partially degraded or completely offline servers. Networks are inherently flaky and lossy; the entire caching stack assumes this to be a non-negotiable fact and is designed to maintain availability when servers are unavailable or slow. Fortunately, cache data is transient by nature, which relaxes requirements around data durability which would otherwise be required of persistent stores like databases. Within Pinterest, mcrouter automatically fails over requests to a globally shared cluster when individual servers are offline or responding to requests too slowly, and it automatically brings servers back into the serving pool through active health checks. Combined with rich proxy-layer instrumentation on individual server failures, this allows operators to identify and replace misbehaving servers with minimal production downtime.
  • Data redundancy through transparent cross-zone replication. Critical use cases are replicated across multiple clusters spanning distinct AWS Availability Zones (AZs). This allows total loss of an AZ with zero downtime: all requests are automatically redirected to a healthy replica located in a separate AZ, where a complete redundant copy of the data is available.
  • Isolated shadow testing against real production traffic. Traffic routing features within mcrouter allow us to perform a variety of resiliency exercises including cluster-to-cluster dark traffic and artificial latency and downtime injection against real production requests without impacting production.

Load Balancing and Data Sharding

One of the key features of a distributed system is horizontal scalability — the ability to scale out rather than scale up to accommodate additional traffic growth. At Pinterest, the vast majority of our caching workloads are throughput-bound, requiring scaling the number of instances in the cluster roughly linearly proportionally to the volume of inbound requests. However, memcached itself is an extremely simple key-value store, which by itself has no knowledge of other peers in the cluster. How are hundreds of millions of requests per second actually routed through the network to the right servers?

Mcrouter applies a hashing algorithm against the cache key of every incoming request to deterministically shard the request to one host within a pool. This works well for evenly distributing traffic among servers, but memcached has a unique requirement that its clusters need to be _arbitrarily scalable _— operators need to be able to freely adjust cluster capacity in response to changing traffic demands while minimizing the client-side impact.

Consistent hashing ensures that most of a keyspace partition maps to the same server even as the total number of eligible shards increases or decreases. This allows the system to scale out transparently to the client layer due to highly localized and predictable hit rate impact, thus reducing the chance that small changes in capacity cause catastrophic drops in cluster-wide hit rate.

Figure 3: Consistent hashing keeps most of the keyspace server allocation intact when one node is added to an existing pool.

The client-side routing layer maps a single key prefix to one or more such consistently hashed pools behind one of several routing policies, including AZ-affinity preference routing for cross-AZ replicated clusters, L1L2 routing for in-memory clusters backed by a fallthrough flash-based capacity cluster, and more. This allows isolating traffic and thus allocating capacity by client use case, and it ensures consistent cache routing behavior from any client machine in Pinterest’s fleet.

Tradeoffs and Considerations

All sufficiently complex infrastructure systems are characterized by (often highly nuanced) trade-offs. In the course of building out and scaling our caching systems, we weighed the costs and benefits of many trade-offs. A few are highlighted below:

  • An intermediary proxy layer presents a non-trivial additional amount of both compute and I/O overhead, especially for a performance-critical system with tight latency SLOs. However, the high availability abstractions, flexible routing behaviors, and many other features provided by mcrouter far outweigh the performance penalties.
  • A globally shared proxy configuration presents risks in change rollouts, since all control plane changes are applied across the entire fleet of tens of thousands of machines at Pinterest on deployment. However, this also ensures globally consistent knowledge of the memcached fleet topology and associated routing policies, regardless of where or how a client is deployed within Pinterest.
  • We operate ~100 distinct memcached clusters, many of which have different tenancy characteristics (dedicated versus shared), hardware instance types, and routing policies. While this presents a sizable maintenance burden on the team, it also allows for effective performance and availability isolation per use case, while also providing opportunities for efficiency optimization by choosing parameters and instance types most appropriate for a particular workload’s usage profile.
  • Leveraging a consistent hashing scheme for load distribution among a pool of upstream servers works well for the majority of cases, even if the keyspace is characterized by clusters of similarly-prefixed keys. However, this doesn’t solve problems with hot keys — an abnormal increase in request volume for a specific set of keys still results in load imbalance caused by hot shard(s) in the server cluster.

Future Work

Looking forward, we hope to continue delivering improved efficiency, reliability, and performance of Pinterest’s cache infrastructure. This includes experimental projects like embedding memcached core directly into a host application process for performance-critical use cases (to allow memcached to share memory space with the service process as well as eliminate network and I/O overhead) and reliability projects like a designing a robust solution for multi-region redundancy.

Thanks to the entire Storage and Caching team at Pinterest for supporting this work, especially Ankita Girish Wagh and Lianghong Xu.

Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.
Tools mentioned in article
Open jobs at Pinterest
Video Platform Engineer
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Video is becoming the most important content format on Pinterest ecosystem. This role will act as an architect for Pinterest video platform, which responsible for the whole lifecycle of a video from uploading, transcoding, delivery and playback. The video architect will oversee Pinterest video platform strategy, owns the direction of what will be our next strategic investment to strengthen our video platform, and land the strategy into major initiatives towards the directions.

What you'll do: 

  • Lead the optimization and improvement in video codec efficiency, encoder rate control, transcode speed, video pre/post-processing and error resilience.
  • Improve end-to-end video experiences on lossy networks in various user scenarios.
  • Identify various opportunities to optimize in video codec, pipeline, error resilience.
  • Define the video optimization roadmap for both low-end and high-end network and devices.
  • Lead the definition and implementation of media processing pipeline.

What we're looking for: 

  • Experience with AWS Elemental
  • Solid knowledge in modern video codecs such as H.264, H.265, VP8/VP9 and AV1. 
  • Deep understanding of adaptive streaming technology especially HLS and MPEG-DASH.
  • Experience in architecting end to end video streaming infrastructure.
  • Experience in building media upload and transcoding pipelines.
  • Proficient in FFmpeg command line tools and libraries.
  • Familiar with popular client side media frameworks such as AVFoundation, Exoplayer, HLS.js, and etc.
  • Experience with streaming quality optimization on mobile devices.
  • Experience collaborating cross-functionally between groups with different video technologies and pipelines.

#LI-EA1

Senior Software Engineer, Data Privacy
Dublin, IE

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

The Data Privacy Engineering team builds platforms and works with engineers across Pinterest to help ensure our handling of customer and partner data meets or exceeds their expectations of privacy and security.  We’re a small, and growing, team based in Dublin.  We own three major engineering projects with company-wide impact: expanding and onboarding teams doing big data processing to a new fine-grained data access platform, tracking how data moves and evolves through our systems, and ensuring data is always handled appropriately.  As a Senior Engineer, you’ll take a driving role on one of these projects and responsibility for working with internal teams to understand their needs, designing solutions, and collaborating with teams in Dublin and the US to successfully execute on your plans.  Your work will help ensure the safety of our users’ and partners’ data and help Pinterest be a source of inspiration for millions of users.

What you’ll do:

  • Consult with engineers, product designers, and security experts to design data-handling solutions
  • Review code and designs from across the company to guide teams to secure and private solutions
  • Onboard customers onto platforms and refine our tools to streamline these processes
  • Mentor and coach engineers and grow your technical leadership skills, with engineers in Dublin and other offices.
  • Grow your engineering skills as you work with a range of open-source technologies and engineers across the company, and code across Pinterest’s stack in a variety of languages

What we’re looking for:

  • 5+ years of experience building enterprise-scale backend services in an object-oriented programing language (Java preferred)
  • Experience mentoring junior engineers and driving an engineering culture
  • The ability to drive ambiguous projects to successful outcomes independently
  • Understanding of big-data processing concepts
  • Experience with data querying and analytics techniques
  • Strong advocacy for the customer and their privacy

#LI-KL1

Software Engineer, Key Value Systems
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Pinterest brings millions of Pinners the inspiration to create a life they love for everything; whether that be tonight’s dinner, next summer’s vacation, or a dream house down the road. Our Key Value Systems team is responsible for building and owning the systems that store and serve data that powers Pinterest's business-critical applications. These applications range from user-facing features all the way to being integral components of our machine learning processing systems. The mission of the team is to provide storage and serving systems that are not only highly scalable, performant, and reliable, but also a delight to use. Our systems enable our product engineers to move fast and build awesome features rapidly on top of them.

What you’ll do

  • Build, own, and improve Pinterest's next generation key-value platform that will store petabytes of data, handle tens of millions of QPS, and serve hundreds of use cases powering almost all of Pinterest's business-critical applications
  • Contribute to open-source databases like RocksDB and Rocksplicator
  • Own, improve, and contribute to the main key-value storage platform, streaming write architectures using Kafka, and additional derivative
  • RocksDB-based distributed systems
  • Continually improve operability, scalability, efficiency, performance, and reliability of our storage solutions

What we’re looking for:

  • Deep expertise on online distributed storage and key-value stores at consumer Internet scale
  • Strong ability to work cross-functionally with product teams and with the storage SRE/DBA team
  • Fluent in C/C++ and Java
  • Good communication skills and an excellent team player

#LI-KL1

Head of Ads Delivery Engineering
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Pinterest is on a mission to help millions of people across the globe to find the inspiration to create a life they love. Within the Ads Quality team, we try to connect the dots between the aspirations of pinners and the products offered by our partners. 

You will lead an ML centric organization that is responsible for the optimization of the ads delivery funnel and Ads marketplace at Pinterest. Using your strong analytical skill sets, thorough understanding of machine learning, online auctions and experience in managing an engineering team you’ll advance the state of the art in ML and auction theory while at the same time unlock Pinterest’s monetization potential.  In short, this is a unique position, where you’ll get the freedom to work across the organization to bring together pinners and partners in this unique marketplace.

What you’ll do: 

  • Manage the ads delivery engineering organization, consisting of managers and engineers with a background in ML, backend development, economics and data science
  • Develop and execute a vision for ads marketplace and ads delivery funnel
  • Build strong XFN relationships with peers in Ads Quality, Monetization and the larger engineering organization, as well as with XFN partners in Product, Data Science, Finance and Sales

What we’re looking for:

  • MSc. or Ph.D. degree in Economics, Statistics, Computer Science or related field
  • 10+ years of relevant industry experience
  • 5+ years of management experience
  • XFN collaborator and a strong communicator
  • Hands-on experience building large-scale ML systems and/or Ads domain knowledge
  • Strong mathematical skills with knowledge of statistical models (RL, DNN)

#LI-TG1

Verified by
Security Software Engineer
Tech Lead, Big Data Platform
Software Engineer
Talent Brand Manager
Sourcer
Software Engineer
You may also like