Apr 13, 2022

Pinterest's profile on StackShare is not actively maintained, so the information here may be out of date.
36 tools
1 tool
1 tool
Apr 13, 2022
A consistent plane across teams at Pinterest to achieve deploy orchestration using EC2 APIs underneath.
Apr 13, 2022
Easy to store keys and values in terms of objects. Easy to classify data between personally identifiable vs non critical. Easy to add access controls through s3 bucket policies and IAM policies.
Apr 13, 2022
Security team builds services, solutions and tools for teams within Pinterest to manage accesses of critical production resources as well as facilitate adding authentication, authorization and accounting within critical production micro-services. Issuing identities to > 130k AWS EC2 instances, using them to make authentication and authorization decisions high bandwidth critical traffic flow conditions while services communicate in a mesh, requires a great deal of performance and stability. GoLang provides exactly that. Also primary engineering skills in Security team need not to be fully familiar with complex programming logic requires in Java/Kotlin while avoiding the pitfalls on runtime failures or uncertain behavior using Python in production leads us to GoLang

Nov 27, 2019
To provide employees with the critical need of interactive querying, we’ve worked with Presto, an open-source distributed SQL query engine, over the years. Operating Presto at Pinterest’s scale has involved resolving quite a few challenges like, supporting deeply nested and huge thrift schemas, slow/ bad worker detection and remediation, auto-scaling cluster, graceful cluster shutdown and impersonation support for ldap authenticator.
Our infrastructure is built on top of Amazon EC2 and we leverage Amazon S3 for storing our data. This separates compute and storage layers, and allows multiple compute clusters to share the S3 data.
We have hundreds of petabytes of data and tens of thousands of Apache Hive tables. Our Presto clusters are comprised of a fleet of 450 r4.8xl EC2 instances. Presto clusters together have over 100 TBs of memory and 14K vcpu cores. Within Pinterest, we have close to more than 1,000 monthly active users (out of total 1,600+ Pinterest employees) using Presto, who run about 400K queries on these clusters per month.
Each query submitted to Presto cluster is logged to a Kafka topic via Singer. Singer is a logging agent built at Pinterest and we talked about it in a previous post. Each query is logged when it is submitted and when it finishes. When a Presto cluster crashes, we will have query submitted events without corresponding query finished events. These events enable us to capture the effect of cluster crashes over time.
Each Presto cluster at Pinterest has workers on a mix of dedicated AWS EC2 instances and Kubernetes pods. Kubernetes platform provides us with the capability to add and remove workers from a Presto cluster very quickly. The best-case latency on bringing up a new worker on Kubernetes is less than a minute. However, when the Kubernetes cluster itself is out of resources and needs to scale up, it can take up to ten minutes. Some other advantages of deploying on Kubernetes platform is that our Presto deployment becomes agnostic of cloud vendor, instance types, OS, etc.
Nov 20, 2019
One of our top priorities at Pinterest is fostering a safe and trustworthy experience for all Pinners. As Pinterest’s user base and ads business grow, the review volume has been increasing exponentially, and more content types require moderation support. To solve greater engineering and operational challenges at scale, we needed a highly-reliable and performant system to detect, report, evaluate, and act on abusive content and users and so we created Pinqueue.
Pinqueue-3.0 serves as a generic platform for content moderation and human labeling. Under the hood, Pinqueue3.0 is a Flask + React app powered by Pinterest’s very own Gestalt UI framework. On the backend, Pinqueue3.0 heavily relies on PinLater, a Pinterest-built reliable asynchronous job execution system, to handle the requests for enqueueing and action-taking. Using PinLater has significantly strengthened Pinqueue3.0’s overall infra with its capability of processing a massive load of events with configurable retry policies.
Hundreds of millions of people around the world use Pinterest to discover and do what they love, and our job is to protect them from abusive and harmful content. We’re committed to providing an inspirational yet safe experience to all Pinners. Solving trust & safety problems is a joint effort requiring expertise across multiple domains. Pinqueue3.0 not only plays a critical role in responsively taking down unsafe content, it also has become an enabler for future ML/automation initiatives by providing high-quality human labels. Going forward, we will continue to improve the review experience, measure review quality and collaborate with our machine learning teams to solve content moderation beyond manual reviews at an even larger scale.
Nov 28, 2018
By late 2018, Pinterest was running one of the largest cloud Kafka deployments in the world, handling internal use cases like metrics, reporting, and monitoring, while also powering realtime features in the app, like content recommendations and spam detection.
To deliver that breath of coverage, Pinterest runs more than 2,000 brokers on AWS, “transporting more than 800 billion messages and more than 1.2 petabytes per day, and handling more than 15 million messages per second during the peak hours.”
They run Kafka in three AWS regions, with most of the workers in us-east-1, and with MirrorMaker transporting data among the three regions. Within each region, brokers are spread among multiple clusters for topic-level isolation, limited exposure across topics if one cluster fails.
Jul 12, 2017
In early 2017, Pinterest began moving its workload from EC2 instances to Docker containers with the goals of improving developer velocity, increasing reliability through immutable infrastructure, simplifying upgrades, and improving overall efficiency.
A single AMI (Amazon Machine Image) houses all container services, and all service-specific dependencies are put into this container.
Developers use Pinterest’s open-source tool Teletraan to launch containers, and the Docker container engine acts as the process manager to monitor and restart containers.
Dec 11, 2015
In late 2015, following the Series G, Pinterest began migrating their web experience to React, primarily because they “found React rendered faster than our previous template engine, had fewer obstacles to iterating on features and had a large developer community.”
The legacy setup consistent of Django, Python and Jinja on the backend, with Nunjucks handling template rendering on the client side. They wanted to move to React for handling template rendering across the board, but if they “switched the client-side rendering engine from Nunjucks to React, [they’d] also have to switch [their] server-side rendering, so they could share the same template syntax.”
They decided on an iterative approach that consolidated a single template rendering engine between client and server, since “If the server could interpret JavaScript, and use Nunjucks to render templates and share our client-side code, we could then move forward with an iterative migration to React.” The team decided to stand up a Node process, behind Nginx, and interpret JavaScript server-side.
Now, when a user agent makes a request, a latent module render requests that it needs data via an API call. Concurrently, a separate network call is made “to a co-located Node process to render the template as far as it can go with the data that it has.”
Node then responds with rendered templates, and along with a “holes” array to indicate what data was still needed to complete the render. Finally, the Python webapp makes an API call to fetch the remaining data, and each module is sent back to Node as completely independent module requests/in parallel/.
With this framework in place, Pinterest developers are in the process of replacing Nunjucks code with React components throughout the codebase.
Oct 1, 2014
Like many large scale web sites, Pinterest’s infrastructure consists of servers that communicate with backend services composed of a number of individual servers for managing load and fault tolerance. Ideally, we’d like the configuration to reflect only the active hosts, so clients don’t need to deal with bad hosts as often. ZooKeeper provides a well known pattern to solve this problem. Zookeeper
Jul 11, 2014
By mid-2014, around the time of the Series F, Pinterest users had already created more than 30 billion Pins, and the company was logging around 20 terabytes of new data daily, with around 10 petabytes of data in S3. To drive personalization for its users, and to empower engineers to build big data applications quickly, the data team built a self-serve Hadoop platform.
To start, they decoupled compute from storage, which meant teams would have to worry less about loading or synchronizing data, allowing existing or future clusters to make use of the data across a single shared file system.
A centralized Hive metastore act as the source of truth. They chose Hive for most of their Hadoop jobs “primarily because the SQL interface is simple and familiar to people across the industry.”
Dependency management takes place across three layers: *** Baked AMIs**, which are large slow-loading dependencies pre-loaded on images; Automated Configurations (Masterless Puppets), which allows Puppet clients to “pull their configuration from S3 and set up a service that’s responsible for keeping S3 configurations in sync with the Puppet master;” and Runtime Staging on S3, which creates a working directory at runtime for each developer that pulls down its dependencies directly from S3.
Finally, they migrated their Hadoop jobs to Qubole, which “supported AWS/S3 and was relatively easy to get started on.”












































