Scaling Cache Infrastructure at Pinterest

2,465
Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.

By Kevin Lin | Software Engineer, Storage and Caching


Demand on Pinterest’s core infrastructure systems is accelerating faster than ever as more Pinners come to Pinterest to find inspiration. A distributed cache layer fronting many services and databases is one of our core storage systems that sits near the bottom of Pinterest’s infrastructure stack, responsible for absorbing the vast majority of backend traffic driven by this growth.

Pinterest’s distributed cache fleet spans an EC2 instance footprint consisting of thousands of machines, caching hundreds of terabytes of data served at more than 150 million requests per second at peak. This cache layer optimizes top-level performance by driving down latency across the entire backend stack and provides significant cost efficiency by reducing the capacity required for expensive backends. We’ll do a technical deep dive into the infrastructure that supports Pinterest’s cache fleet at scale.

Application Data Caching

Every API request incoming to Pinterest internally fans out to a complex tree of RPCs through the stack, hitting tens of services before completion of its critical path. This can include services for querying core data like boards and Pins, recommendation systems for serving related Pins, and spam detection systems. At many of these layers, the result of a discrete unit of work can be cached in a transient store for future reuse, as long as its input data can be described by a unique key.

At Pinterest, the most common use of the distributed cache layer is storing such results of intermediate computations with lookaside semantics. This allows the cache layer to absorb a significant share of traffic that would otherwise be destined for compute-expensive or storage-expensive services and databases. With both single-digit millisecond tail latency and an extraordinarily low infrastructure dollar cost per request served, the distributed cache layer offers a performant and cost-efficient mechanism to scale a variety of backends to meet growing Pinterest demand.

Figure 1: Highly simplified lifecycle of an API request through Pinterest’s main API service, its dependency service backends, and the distributed cache layer.

Offering a distributed cache tier as a service allows application developers to focus on implementing business logic without worrying about distributed data consistency, high availability, or memory capacity. Cache clients use a universal routing abstraction layer that ensures applications have a fault-tolerant and consistent view of data. The server fleet can additionally be scaled out independently of the application layer to transparently adjust memory or throughput capacity to accommodate changes in resource usage profiles.

The Backbone of Distributed Caching: Memcached and Mcrouter

Memcached and Mcrouter form the backbone of Pinterest’s distributed caching infrastructure and play a critical role in Pinterest’s storage infrastructure stack. Memcached is an open source, highly efficient, in-memory key-value store written in pure C. Mcrouter is a layer 7 memcached protocol proxy that sits in front of the memcached fleet and provides powerful high availability and routing features.

Memcached is an attractive choice as a caching solution:

  • Thanks in part to its asynchronous event-driven architecture and multithreaded processing model, memcached is extremely efficient and easily amenable to horizontal scaling to meet capacity demands.
  • Extstore helps realize incredible storage efficiency wins with a secondary, warm storage tier located on the instance’s NVMe flash disk.
  • Memcached’s deliberately simple architecture provides flexibility in building abstractions on top of it, as well as provides easy horizontal scalability to meet increased demand. A single memcached process by itself is a simple key-value store and deliberately has no knowledge of its peers, or even the notion of a memcached cluster.
  • Memcached has been battle-tested for accuracy and performance over decades of development and is surrounded by an active open source community (which has also accepted several Pinterest patches into upstream).
  • Memcached ships with native support for TLS termination, allowing us to secure the entire fleet with mutual TLS-authenticated traffic (with additional SPIFFE-based authorization access controls built in-house).

Mcrouter was open sourced by Facebook in 2014 and played a crucial role in scaling their memcached deployment. It fits well into Pinterest’s architecture for similar reasons:

  • Mcrouter acts as an effective abstraction of the entire memcached server fleet by providing application developers a single endpoint for interacting with the entire cache fleet. Additionally, using Mcrouter as the single interface to the system ensures universal, globally consistent traffic behavior across all services and machines at Pinterest.
  • Mcrouter presents a decoupled control plane and data plane: the entire topology of the memcached server fleet is organized into “pools” (logical clusters), while all request routing policies and behaviors dictating the interaction between clients and server pools are managed independently.
  • Mcrouter’s configuration API provides robust building blocks for complex routing behaviors, including zone-affinity routing, replication for data redundancy, multi-level cache tiers, and shadow traffic.
  • As a layer 7 proxy that speaks memcached’s ASCII protocol, mcrouter exposes intelligent protocol-specific features like request manipulation (TTL modification, in-flight compression, and more).
  • Rich observability features are provided out of the box at no cost to the client application, providing detailed visibility into memcached traffic across all of our infrastructure. Most important to us are percentile request latency, throughput sliced along individual client and server dimensions, request trends by key prefixes and key patterns, and error rates for detecting misbehaving servers.

Figure 2: Overview of request routing from mcrouter to memcached. Every key prefix is associated with a routing policy; two examples are shown.

In practice, mcrouter is deployed as a service-colocated, out-of-process proxy sidecar. As shown in Figure 2, applications (written in any language) send memcached protocol requests to mcrouter on loopback, and mcrouter proxies those requests to thousands of upstream memcached servers. This architecture allows us to build out robust features in a fully managed cache server fleet, entirely transparent to consuming services.

While memcached has been a part of the Pinterest infrastructure stack ever since the early days, our strategy around scaling its client-side counterpart has evolved significantly over the years. In particular, routing and discovery was first done in client-side libraries (which was brittle and coupled with binary deploys), followed by an in-house built routing proxy (which didn’t provide extensible building blocks for high availability), and finally mcrouter.

Compute and Storage Efficiency

Memcached is highly efficient: a single r5.2xlarge EC2 instance is capable of sustaining in excess of 100K requests per second and tens of thousands of concurrent TCP connections without tangible client-side latency degradation, making memcached Pinterest’s most throughput-efficient production service. This is due in part both to well-written C and its architecture, which makes use of multiple worker threads that independently run a libevent-driven event loop to service incoming connections.

At Pinterest, memcached’s extstore drives huge wins in storage efficiency for use cases ranging from Visual Search to personalized search recommendation engines. Extstore expands cached data capacity to a locally mounted NVMe flash disk in addition to DRAM, which increases available per-instance storage capacity from ~55 GB (r5.2xlarge) to nearly 1.7 TB (i3.2xlarge) for a fraction of the instance cost. In practice, extstore has benefitted data capacity-bound use cases without sacrificing end-to-end latency despite several orders of magnitude in difference between DRAM and SSD response times. Extstore’s built-in tuning knobs has allowed us to identify a sweet spot that balances disk I/O, disk-to-memory recache rate, compaction frequency and aggressiveness, and client-side tail response time.

High Availability

All infrastructure systems at Pinterest are highly available, and our caching systems are no exception. Leveraging rich routing features in mcrouter, our memcached fleet has a wide array of fault tolerance features:

  • Automatic failover for partially degraded or completely offline servers. Networks are inherently flaky and lossy; the entire caching stack assumes this to be a non-negotiable fact and is designed to maintain availability when servers are unavailable or slow. Fortunately, cache data is transient by nature, which relaxes requirements around data durability which would otherwise be required of persistent stores like databases. Within Pinterest, mcrouter automatically fails over requests to a globally shared cluster when individual servers are offline or responding to requests too slowly, and it automatically brings servers back into the serving pool through active health checks. Combined with rich proxy-layer instrumentation on individual server failures, this allows operators to identify and replace misbehaving servers with minimal production downtime.
  • Data redundancy through transparent cross-zone replication. Critical use cases are replicated across multiple clusters spanning distinct AWS Availability Zones (AZs). This allows total loss of an AZ with zero downtime: all requests are automatically redirected to a healthy replica located in a separate AZ, where a complete redundant copy of the data is available.
  • Isolated shadow testing against real production traffic. Traffic routing features within mcrouter allow us to perform a variety of resiliency exercises including cluster-to-cluster dark traffic and artificial latency and downtime injection against real production requests without impacting production.

Load Balancing and Data Sharding

One of the key features of a distributed system is horizontal scalability — the ability to scale out rather than scale up to accommodate additional traffic growth. At Pinterest, the vast majority of our caching workloads are throughput-bound, requiring scaling the number of instances in the cluster roughly linearly proportionally to the volume of inbound requests. However, memcached itself is an extremely simple key-value store, which by itself has no knowledge of other peers in the cluster. How are hundreds of millions of requests per second actually routed through the network to the right servers?

Mcrouter applies a hashing algorithm against the cache key of every incoming request to deterministically shard the request to one host within a pool. This works well for evenly distributing traffic among servers, but memcached has a unique requirement that its clusters need to be _arbitrarily scalable _— operators need to be able to freely adjust cluster capacity in response to changing traffic demands while minimizing the client-side impact.

Consistent hashing ensures that most of a keyspace partition maps to the same server even as the total number of eligible shards increases or decreases. This allows the system to scale out transparently to the client layer due to highly localized and predictable hit rate impact, thus reducing the chance that small changes in capacity cause catastrophic drops in cluster-wide hit rate.

Figure 3: Consistent hashing keeps most of the keyspace server allocation intact when one node is added to an existing pool.

The client-side routing layer maps a single key prefix to one or more such consistently hashed pools behind one of several routing policies, including AZ-affinity preference routing for cross-AZ replicated clusters, L1L2 routing for in-memory clusters backed by a fallthrough flash-based capacity cluster, and more. This allows isolating traffic and thus allocating capacity by client use case, and it ensures consistent cache routing behavior from any client machine in Pinterest’s fleet.

Tradeoffs and Considerations

All sufficiently complex infrastructure systems are characterized by (often highly nuanced) trade-offs. In the course of building out and scaling our caching systems, we weighed the costs and benefits of many trade-offs. A few are highlighted below:

  • An intermediary proxy layer presents a non-trivial additional amount of both compute and I/O overhead, especially for a performance-critical system with tight latency SLOs. However, the high availability abstractions, flexible routing behaviors, and many other features provided by mcrouter far outweigh the performance penalties.
  • A globally shared proxy configuration presents risks in change rollouts, since all control plane changes are applied across the entire fleet of tens of thousands of machines at Pinterest on deployment. However, this also ensures globally consistent knowledge of the memcached fleet topology and associated routing policies, regardless of where or how a client is deployed within Pinterest.
  • We operate ~100 distinct memcached clusters, many of which have different tenancy characteristics (dedicated versus shared), hardware instance types, and routing policies. While this presents a sizable maintenance burden on the team, it also allows for effective performance and availability isolation per use case, while also providing opportunities for efficiency optimization by choosing parameters and instance types most appropriate for a particular workload’s usage profile.
  • Leveraging a consistent hashing scheme for load distribution among a pool of upstream servers works well for the majority of cases, even if the keyspace is characterized by clusters of similarly-prefixed keys. However, this doesn’t solve problems with hot keys — an abnormal increase in request volume for a specific set of keys still results in load imbalance caused by hot shard(s) in the server cluster.

Future Work

Looking forward, we hope to continue delivering improved efficiency, reliability, and performance of Pinterest’s cache infrastructure. This includes experimental projects like embedding memcached core directly into a host application process for performance-critical use cases (to allow memcached to share memory space with the service process as well as eliminate network and I/O overhead) and reliability projects like a designing a robust solution for multi-region redundancy.

Thanks to the entire Storage and Caching team at Pinterest for supporting this work, especially Ankita Girish Wagh and Lianghong Xu.

Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.
Tools mentioned in article
Open jobs at Pinterest
Senior Backend Engineer, User Underst...
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

The user understanding team builds cutting edge user understanding models and systems to deeply understand the evolving interests, intents and tastes of our 450m+ users, which is one of the most essential ML components powering personalization of Pinterest products across Discovery (Homefeed, Search, Related Pins), Ads, Shopping and Growth. As a backend engineer, you will have the opportunity to help build the next generation large-scale user signal platform and potentially affect every surface of Pinterest with deeper user understanding.

What you’ll do:

  • As a backend engineer, design and build large scale systems that can process profile, activities and feedback from hundreds of millions of pinterest users.
  • Design and build systems / tools, e.g.,
  • Drive cross functional collaborations with partner teams adopting user signals and models, e.g. ads, Homefeed, shopping, search and etc.
  • Work with ML Engineers to manage large-scale ML models in production

What we’re looking for:

  • 4+ years of industry experience.
  • Expert in Python and Java or other static languages like Go or C++
  • Hands-on experience building complex backend systems leveraged by multiple clients.
  • Hands-on experience with big data technologies (e.g., Hadoop/Spark/Kafka) and scalable realtime systems that process stream data.
  • Nice to have:
    • Experience working with privacy sensitive data or GDPR compliance.
    • Basic knowledge of machine learning (or willing to learn!): feature extraction, training, model serving etc.

#LI-TG1

Engineer Manager, Content Knowledge S...
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Pinterest helps people Discover and Do the things they love. We have more than 450M monthly active users who actively curate an ecosystem of more than 100B Pins on more than 1B boards, creating a rich human curated graph of immense value. 

Technically, we are building out an internet scale personalized recommendation engine in 22+ languages, which requires a deep understanding of the users and content on our platform. As engineer manager on the Content Knowledge Signal team, you’ll work on building 20+ content understanding signals based on Pinterest Knowledge Graph, which will make measurably positive impact on hundreds of millions of users with improved recommendation and featurization breakthroughs on almost all Pinterest product surfaces (Discovery, Shopping, Growth, Ads, etc). 

What you'll do:

  • Manage a horizontal team of talented and dedicated ML engineers to build the foundational content understanding and engagement features of our contents to be used across all Pinterest ecosystems
  • Utilize state of the art algorithms/industry best practice to build and improve content understanding signals 
  • Partner with other engineering teams and sales & marketing team to discover future opportunities to improve content recommendation on Pinterest
  • Hire new engineers to grow the team
  • Build ML models using text and visual information of a pin, identify the most relevant set of text annotations for that pin. These sets of highly relevant annotations are among the most important features used in more than 30 use cases within Pinterest, including key ranking models of Homefeed, Search and Ads.
  • Build ML models using text and images of the products, to understand their product categories (bags, shoes, shirts, etc) and their attributes (brand, color, style, etc). They are used to greatly improve relevance for product recommendation on major shopping surfaces. 
  • Build ML models to understand search queries, then use them, together with Pin level signals, to boost search relevance. 
  • Build graph based embedding as well as explicit annotation to represent the specialties of our native content creators, to improve creator and native content recommendation.
  • Build highly efficient and expandable data pipelines to understand engagement data at various entity levels. Such engagement signals are the major feature of the ranking models for our three main Discovery surfaces. 
  •  

What we're looking for:

  • 2+ years of industrial experience in ML team’s EM or TL for one or multiple of the following use cases with large scale: ads targeting, search and discovery, growth, content/user understanding
  • Hands-on experience working with ML algorithm development and productization.  
  • Experience working with PMs and XFN partners on E2E systems and moving business metrics

#TG1

Our Commitment to Diversity:

At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.

Software Engineer, Machine Learning P...
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

We are seeking a senior software engineer to build and boost Pinterest’s machine learning training and serving platforms and infrastructure. The candidate will work with different teams to design, build and improve our ML systems, including the model training computation platform, serving systems and model deployment systems.

What you'll do:

  • Design and build solutions to make the model training, serving and deployment process more efficient, more reliable, and less error-prone by human mistakes.
  • Design and build long term solutions to boost the model iteration velocity for machine learning engineers and data scientists.
  • Work extensively with ML engineers across Pinterest to understand their requirements, pain points, and build generalized solutions. Also work with partner teams to drive projects requiring cross-team coordination. 
  • Provide technical guidance and coaching to other junior engineers in the team.

What we're looking for:

  • Hands-on experience developing large-scale machine learning models in production, or experience working on the systems supporting onboarding large-scale machine learning models.
  • Ability to drive cross-team projects; Ability to understand our internal customers (ML practitioners), their common usage patterns and pain points.
  • Flexibility to work across different areas: tool building, model optimization, infrastructure optimization, large scale data processing pipelines, etc.
  • 5+ years of professional experience in software engineering.
  • Fluency in Python and either Java or Scala (Fluency in C++ for the MLS role).
  • Past tech lead experience is preferred, but not required. (Not necessary for the MLS role).

#LI-GB2

Our Commitment to Diversity:

At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.

Engineering Manager, Ads Engagement M...
San Francisco, CA, US; Palo Alto, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Pinterest is one of the fastest growing online ad platforms, and our success depends on mining rich user interest data that helps us connect users with highly relevant advertisers/products. We’re looking for an Engineering Manager with experience in machine learning, data mining, and information retrieval to lead a team that develops new data-driven techniques to show the most engaging and relevant promoted content to the users. You’ll be leading a world-class ML team that is growing quickly and laying the foundation for Pinterest’s business success.

What you’ll do:

  • Manage and grow the engineering team, providing technical vision and long-term roadmap
  • Design features and build large-scale machine learning models to improve ads engagement prediction
  • Effectively collaborate and partner with several cross functional teams to build the next generation of ads engagement models
  • Mentor and grow ML engineers to allow them to become experts in modeling/engagement prediction 

What we’re looking for:

  • Degree in Computer Science, Statistics or related field
  • Industry experience building production machine learning systems at scale, data mining, search, recommendations, and/or natural language processing
  • 1+ years of experience leading projects/ teams either as TL/ TLM/ EM
  • Cross-functional collaborator and strong communicator
  • Experience with ads domain is a big plus

#LI-SM4

Our Commitment to Diversity:

At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.

Verified by
Security Software Engineer
Tech Lead, Big Data Platform
Software Engineer
Talent Brand Manager
Sourcer
Software Engineer
You may also like