Scaling Cache Infrastructure at Pinterest

Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.

By Kevin Lin | Software Engineer, Storage and Caching

Demand on Pinterest’s core infrastructure systems is accelerating faster than ever as more Pinners come to Pinterest to find inspiration. A distributed cache layer fronting many services and databases is one of our core storage systems that sits near the bottom of Pinterest’s infrastructure stack, responsible for absorbing the vast majority of backend traffic driven by this growth.

Pinterest’s distributed cache fleet spans an EC2 instance footprint consisting of thousands of machines, caching hundreds of terabytes of data served at more than 150 million requests per second at peak. This cache layer optimizes top-level performance by driving down latency across the entire backend stack and provides significant cost efficiency by reducing the capacity required for expensive backends. We’ll do a technical deep dive into the infrastructure that supports Pinterest’s cache fleet at scale.

Application Data Caching

Every API request incoming to Pinterest internally fans out to a complex tree of RPCs through the stack, hitting tens of services before completion of its critical path. This can include services for querying core data like boards and Pins, recommendation systems for serving related Pins, and spam detection systems. At many of these layers, the result of a discrete unit of work can be cached in a transient store for future reuse, as long as its input data can be described by a unique key.

At Pinterest, the most common use of the distributed cache layer is storing such results of intermediate computations with lookaside semantics. This allows the cache layer to absorb a significant share of traffic that would otherwise be destined for compute-expensive or storage-expensive services and databases. With both single-digit millisecond tail latency and an extraordinarily low infrastructure dollar cost per request served, the distributed cache layer offers a performant and cost-efficient mechanism to scale a variety of backends to meet growing Pinterest demand.

Figure 1: Highly simplified lifecycle of an API request through Pinterest’s main API service, its dependency service backends, and the distributed cache layer.

Offering a distributed cache tier as a service allows application developers to focus on implementing business logic without worrying about distributed data consistency, high availability, or memory capacity. Cache clients use a universal routing abstraction layer that ensures applications have a fault-tolerant and consistent view of data. The server fleet can additionally be scaled out independently of the application layer to transparently adjust memory or throughput capacity to accommodate changes in resource usage profiles.

The Backbone of Distributed Caching: Memcached and Mcrouter

Memcached and Mcrouter form the backbone of Pinterest’s distributed caching infrastructure and play a critical role in Pinterest’s storage infrastructure stack. Memcached is an open source, highly efficient, in-memory key-value store written in pure C. Mcrouter is a layer 7 memcached protocol proxy that sits in front of the memcached fleet and provides powerful high availability and routing features.

Memcached is an attractive choice as a caching solution:

  • Thanks in part to its asynchronous event-driven architecture and multithreaded processing model, memcached is extremely efficient and easily amenable to horizontal scaling to meet capacity demands.
  • Extstore helps realize incredible storage efficiency wins with a secondary, warm storage tier located on the instance’s NVMe flash disk.
  • Memcached’s deliberately simple architecture provides flexibility in building abstractions on top of it, as well as provides easy horizontal scalability to meet increased demand. A single memcached process by itself is a simple key-value store and deliberately has no knowledge of its peers, or even the notion of a memcached cluster.
  • Memcached has been battle-tested for accuracy and performance over decades of development and is surrounded by an active open source community (which has also accepted several Pinterest patches into upstream).
  • Memcached ships with native support for TLS termination, allowing us to secure the entire fleet with mutual TLS-authenticated traffic (with additional SPIFFE-based authorization access controls built in-house).

Mcrouter was open sourced by Facebook in 2014 and played a crucial role in scaling their memcached deployment. It fits well into Pinterest’s architecture for similar reasons:

  • Mcrouter acts as an effective abstraction of the entire memcached server fleet by providing application developers a single endpoint for interacting with the entire cache fleet. Additionally, using Mcrouter as the single interface to the system ensures universal, globally consistent traffic behavior across all services and machines at Pinterest.
  • Mcrouter presents a decoupled control plane and data plane: the entire topology of the memcached server fleet is organized into “pools” (logical clusters), while all request routing policies and behaviors dictating the interaction between clients and server pools are managed independently.
  • Mcrouter’s configuration API provides robust building blocks for complex routing behaviors, including zone-affinity routing, replication for data redundancy, multi-level cache tiers, and shadow traffic.
  • As a layer 7 proxy that speaks memcached’s ASCII protocol, mcrouter exposes intelligent protocol-specific features like request manipulation (TTL modification, in-flight compression, and more).
  • Rich observability features are provided out of the box at no cost to the client application, providing detailed visibility into memcached traffic across all of our infrastructure. Most important to us are percentile request latency, throughput sliced along individual client and server dimensions, request trends by key prefixes and key patterns, and error rates for detecting misbehaving servers.

Figure 2: Overview of request routing from mcrouter to memcached. Every key prefix is associated with a routing policy; two examples are shown.

In practice, mcrouter is deployed as a service-colocated, out-of-process proxy sidecar. As shown in Figure 2, applications (written in any language) send memcached protocol requests to mcrouter on loopback, and mcrouter proxies those requests to thousands of upstream memcached servers. This architecture allows us to build out robust features in a fully managed cache server fleet, entirely transparent to consuming services.

While memcached has been a part of the Pinterest infrastructure stack ever since the early days, our strategy around scaling its client-side counterpart has evolved significantly over the years. In particular, routing and discovery was first done in client-side libraries (which was brittle and coupled with binary deploys), followed by an in-house built routing proxy (which didn’t provide extensible building blocks for high availability), and finally mcrouter.

Compute and Storage Efficiency

Memcached is highly efficient: a single r5.2xlarge EC2 instance is capable of sustaining in excess of 100K requests per second and tens of thousands of concurrent TCP connections without tangible client-side latency degradation, making memcached Pinterest’s most throughput-efficient production service. This is due in part both to well-written C and its architecture, which makes use of multiple worker threads that independently run a libevent-driven event loop to service incoming connections.

At Pinterest, memcached’s extstore drives huge wins in storage efficiency for use cases ranging from Visual Search to personalized search recommendation engines. Extstore expands cached data capacity to a locally mounted NVMe flash disk in addition to DRAM, which increases available per-instance storage capacity from ~55 GB (r5.2xlarge) to nearly 1.7 TB (i3.2xlarge) for a fraction of the instance cost. In practice, extstore has benefitted data capacity-bound use cases without sacrificing end-to-end latency despite several orders of magnitude in difference between DRAM and SSD response times. Extstore’s built-in tuning knobs has allowed us to identify a sweet spot that balances disk I/O, disk-to-memory recache rate, compaction frequency and aggressiveness, and client-side tail response time.

High Availability

All infrastructure systems at Pinterest are highly available, and our caching systems are no exception. Leveraging rich routing features in mcrouter, our memcached fleet has a wide array of fault tolerance features:

  • Automatic failover for partially degraded or completely offline servers. Networks are inherently flaky and lossy; the entire caching stack assumes this to be a non-negotiable fact and is designed to maintain availability when servers are unavailable or slow. Fortunately, cache data is transient by nature, which relaxes requirements around data durability which would otherwise be required of persistent stores like databases. Within Pinterest, mcrouter automatically fails over requests to a globally shared cluster when individual servers are offline or responding to requests too slowly, and it automatically brings servers back into the serving pool through active health checks. Combined with rich proxy-layer instrumentation on individual server failures, this allows operators to identify and replace misbehaving servers with minimal production downtime.
  • Data redundancy through transparent cross-zone replication. Critical use cases are replicated across multiple clusters spanning distinct AWS Availability Zones (AZs). This allows total loss of an AZ with zero downtime: all requests are automatically redirected to a healthy replica located in a separate AZ, where a complete redundant copy of the data is available.
  • Isolated shadow testing against real production traffic. Traffic routing features within mcrouter allow us to perform a variety of resiliency exercises including cluster-to-cluster dark traffic and artificial latency and downtime injection against real production requests without impacting production.

Load Balancing and Data Sharding

One of the key features of a distributed system is horizontal scalability — the ability to scale out rather than scale up to accommodate additional traffic growth. At Pinterest, the vast majority of our caching workloads are throughput-bound, requiring scaling the number of instances in the cluster roughly linearly proportionally to the volume of inbound requests. However, memcached itself is an extremely simple key-value store, which by itself has no knowledge of other peers in the cluster. How are hundreds of millions of requests per second actually routed through the network to the right servers?

Mcrouter applies a hashing algorithm against the cache key of every incoming request to deterministically shard the request to one host within a pool. This works well for evenly distributing traffic among servers, but memcached has a unique requirement that its clusters need to be _arbitrarily scalable _— operators need to be able to freely adjust cluster capacity in response to changing traffic demands while minimizing the client-side impact.

Consistent hashing ensures that most of a keyspace partition maps to the same server even as the total number of eligible shards increases or decreases. This allows the system to scale out transparently to the client layer due to highly localized and predictable hit rate impact, thus reducing the chance that small changes in capacity cause catastrophic drops in cluster-wide hit rate.

Figure 3: Consistent hashing keeps most of the keyspace server allocation intact when one node is added to an existing pool.

The client-side routing layer maps a single key prefix to one or more such consistently hashed pools behind one of several routing policies, including AZ-affinity preference routing for cross-AZ replicated clusters, L1L2 routing for in-memory clusters backed by a fallthrough flash-based capacity cluster, and more. This allows isolating traffic and thus allocating capacity by client use case, and it ensures consistent cache routing behavior from any client machine in Pinterest’s fleet.

Tradeoffs and Considerations

All sufficiently complex infrastructure systems are characterized by (often highly nuanced) trade-offs. In the course of building out and scaling our caching systems, we weighed the costs and benefits of many trade-offs. A few are highlighted below:

  • An intermediary proxy layer presents a non-trivial additional amount of both compute and I/O overhead, especially for a performance-critical system with tight latency SLOs. However, the high availability abstractions, flexible routing behaviors, and many other features provided by mcrouter far outweigh the performance penalties.
  • A globally shared proxy configuration presents risks in change rollouts, since all control plane changes are applied across the entire fleet of tens of thousands of machines at Pinterest on deployment. However, this also ensures globally consistent knowledge of the memcached fleet topology and associated routing policies, regardless of where or how a client is deployed within Pinterest.
  • We operate ~100 distinct memcached clusters, many of which have different tenancy characteristics (dedicated versus shared), hardware instance types, and routing policies. While this presents a sizable maintenance burden on the team, it also allows for effective performance and availability isolation per use case, while also providing opportunities for efficiency optimization by choosing parameters and instance types most appropriate for a particular workload’s usage profile.
  • Leveraging a consistent hashing scheme for load distribution among a pool of upstream servers works well for the majority of cases, even if the keyspace is characterized by clusters of similarly-prefixed keys. However, this doesn’t solve problems with hot keys — an abnormal increase in request volume for a specific set of keys still results in load imbalance caused by hot shard(s) in the server cluster.

Future Work

Looking forward, we hope to continue delivering improved efficiency, reliability, and performance of Pinterest’s cache infrastructure. This includes experimental projects like embedding memcached core directly into a host application process for performance-critical use cases (to allow memcached to share memory space with the service process as well as eliminate network and I/O overhead) and reliability projects like a designing a robust solution for multi-region redundancy.

Thanks to the entire Storage and Caching team at Pinterest for supporting this work, especially Ankita Girish Wagh and Lianghong Xu.

Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.
Tools mentioned in article
Open jobs at Pinterest
Machine Learning Engineer, Homefeed R...
San Francisco, CA

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. As a Pinterest employee, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping users make their lives better in the positive corner of the internet.

Homefeed is a discovery platform at Pinterest that helps users find and explore their personal interests. We work with some of the largest datasets in the world, tailoring over billions of unique content to 330M+ users. Our content ranges across all categories like home decor, fashion, food, DIY, technology, travel, automotive, and much more. Our dataset is rich with textual and visual content and has nice graph properties — harnessing these signals at scale is a significant challenge. The Homefeed ranking team focuses on the machine learning model that predicts how likely a user will interact with a certain piece of content, as well as leveraging those individual prediction scores for holistic optimization to present users with a feed of diverse content.

What you’ll do:

  • Work on state-of-the-art large-scale applied machine learning projects
  • Improve relevance and the user experience on Homefeed
  • Re-architect our deep learning models to improve their capacity and enable more use cases
  • Collaborate with other teams to build/incorporate various signals to machine learning models
  • Collaborate with other teams to extend our machine learning based solutions to other use cases

What we’re looking for:

  • Passionate about applied machine learning and deep learning
  • 8+ years experience applying machine learning methods in settings like recommender systems, search, user modeling, image recognition, graph representation learning, natural language processing


EPM Lead Developer, Adaptive Planning...
San Francisco, CA


The EPM technology team at Pinterest is looking for a senior EPM architect who has at least four years of technical experience in Workday Adaptive Planning. You will be the solutions architect who oversees technical design of the complete EPM ecosystem with emphasis on Adaptive Financial and Workforce planning. The right candidate will also need to have hands-on development experience with Adaptive Planning and related technologies. The role is in IT but will work very closely with FP&A and the greater Finance/Accounting teams. Experience with Tableau suite of tools is a plus.

What you'll do: 

  • Together with the EPM Technology team, you will own Adaptive Planning and all related services
  • Oversee architecture of existing Adaptive Planning solution and make suggestions for improvements
  • Solution and lead Adaptive Planning enhancement projects from beginning to end
  • Help EPM Technology team gain deeper understanding of Adaptive Planning and train the team on Adaptive Planning best practices
  • Establish strong relationship with Finance users and leadership to drive EPM roadmap for Adaptive Planning and related technologies
  • Help establish EPM Center of Excellence at Pinterest
  • This is a contract position at Pinterest. As such, the contractor who fills this role will be employed either by our staffing partner (ProUnlimited) or by an agency partner, and not an employee of Pinterest.
  • All interviews will be scheduled and/or conducted by the Pinterest assignment manager. When a finalist has been selected, ProUnlimited or the agency partner will extend the offer and provide assignment details including duration, benefits options and onboarding details.

What we're looking for: 

  • Hands-on design and build experience with all Adaptive Planning technologies: standard sheets, cube sheets, all dimensions, reporting, integration framework, security, dashboarding and OfficeConnect
  • Strong in application design, data integration and application project lifecycle
  • Comfortable working side-by-side with business
  • Ability to translate business requirements to technical requirements
  • Strong understanding in all three financial statements and the different enterprise planning cycles
  • Familiar with Tableau suite of tools


Machine Learning Engineer, Content Si...
Toronto, CA

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. As a Pinterest employee, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping users make their lives better in the positive corner of the internet.

On the Content Signals team, you’ll be responsible for building machine learning signals from NLP and CV components to productionizing the end product in batch and real-time setting at Pinterest scale. Our systems offer rich semantics to the recommendation platform and enable the product engineers to build deeper experiences to further engage Pinners. In understanding structured and unstructured content, we leverage embeddings, supervised and semi-supervised learning, and LSH. To scale our systems we leverage Spark, Flink, and low-latency model serving infrastructure.

What you’ll do:

  • Apply machine learning approaches to build rich signals that enable ranking and product engineers to build deeper experiences to further engage Pinners
  • Own, improve, and scale signals over both structured and unstructured content that bring tens of millions of rich content to Pinterest each day
  • Drive the roadmap for next-generation content signals that improve the content ecosystem at Pinterest.

What we’re looking for:

  • Deep expertise in content modeling at consumer Internet scale
  • Strong ability to work cross-functionally and with partner engineering teams
  • Expert in Java, Scala or Python


Senior Software Engineer, Shopping Co...
Toronto, CA

Pinterest is aiming to build a world-class shopping experience for our users, and has a unique advantage to succeed due to the high shopping intent of Pinners. The new Shopping Content Mining team being founded in Toronto plays a critical role in this journey. This team is responsible for building a brand new platform for mining and understanding product data, including extracting high quality product attributes from web pages and free texts that come from all major retailers across the world, mining product reviews and product relationships, product classification, etc. The rich product data generated by this platform is the foundation of the unified product catalog, which powers all shopping experiences at Pinterest (e.g., product search & recommendations, product detail page, shop the look, shopping ads).

There are unique technical challenges for this team: building large scale systems that can process billions of products, Machine Learning models that require few training examples to generate wrappers for web pages, NLP models that can extract information from free-texts, easy-to-use human labelling tools that generate high quality labeled data. Your work will have a huge impact on improving the shopping experience of 400M+ Pinners and driving revenue growth for Pinterest.

What you’ll do:

  • As a backend engineer, design and build large scale systems that can process billions of products, e.g., information extraction systems using XML parsers.
  • Design and build systems / tools, e.g.,
    • UI for data labeling and ML model diagnostic
    • feature extraction for ML models
    • extraction template / model fast deployment
    • evaluation / outlier detection system for data quality
  • Drive cross functional collaborations with partner teams working on shopping

What we’re looking for:

  • 5+ years of industry experience
  • Expert in Python and Java
  • Hands-on experience with big data technologies (e.g., Hadoop/Spark) and scalable realtime systems that process stream data
  • Nice to have: Familiarity with information extraction techniques for web-pages and free-texts, Experience working with shopping data is a plus, Experience building internal tools for labeling / diagnosing, Basic knowledge of machine learning (or willing to learn!): feature extraction, training, etc.


Verified by
Security Software Engineer
Tech Lead, Big Data Platform
Software Engineer
Talent Brand Manager
Software Engineer
You may also like