Avatar of John Egan

John Egan

Pinterest

Decision at Pinterest about Zookeeper

Avatar of jwegan
Pinterest

Zookeeper manages our state, and tells each node what version of code it should be running. Zookeeper

1 upvote1.8K views

Decision at Pinterest about HBase

Avatar of jwegan
Pinterest

The final output is inserted into HBase to serve the experiment dashboard. We also load the output data to Redshift for ad-hoc analysis. For real-time experiment data processing, we use Storm to tail Kafka and process data in real-time and insert metrics into MySQL, so we could identify group allocation problems and send out real-time alerts and metrics. HBase

1 upvote1.7K views

Decision at Pinterest about Hadoop

Avatar of jwegan
Pinterest

The MapReduce workflow starts to process experiment data nightly when data of the previous day is copied over from Kafka. At this time, all the raw log requests are transformed into meaningful experiment results and in-depth analysis. To populate experiment data for the dashboard, we have around 50 jobs running to do all the calculations and transforms of data. Hadoop

1 upvote1.5K views

Decision at Pinterest about Amazon S3

Avatar of jwegan
Pinterest

Amazon S3 is where we keep our builds. It鈥檚 a simple way to share data and scales with no intervention on our end. Amazon S3

1 upvote1.5K views

Decision at Pinterest about Varnish

Avatar of jwegan
Pinterest

When you visit the site, you talk to a load balancer which chooses a varnish front-end which in turn talks to our web front-ends which used to run nine python processes. Each of these processes are serving the exact same version on any given web front-end. Varnish

1 upvote1.4K views

Decision at Pinterest about Zookeeper

Avatar of jwegan
Pinterest

Like many large scale web sites, Pinterest鈥檚 infrastructure consists of servers that communicate with backend services composed of a number of individual servers for managing load and fault tolerance. Ideally, we鈥檇 like the configuration to reflect only the active hosts, so clients don鈥檛 need to deal with bad hosts as often. ZooKeeper provides a well known pattern to solve this problem. Zookeeper

1 upvote454 views

Decision at Pinterest about Zookeeper

Avatar of jwegan
Pinterest

Like many large scale web sites, Pinterest鈥檚 infrastructure consists of servers that communicate with backend services composed of a number of individual servers for managing load and fault tolerance. Ideally, we鈥檇 like the configuration to reflect only the active hosts, so clients don鈥檛 need to deal with bad hosts as often. ZooKeeper provides a well known pattern to solve this problem. Zookeeper

1 upvote444 views

Decision at Pinterest about Hadoop

Avatar of jwegan
Pinterest

The massive volume of discovery data that powers Pinterest and enables people to save Pins, create boards and follow other users, is generated through daily Hadoop jobs... Hadoop

1 upvote140 views

Decision at Pinterest about Hadoop

Avatar of jwegan
Pinterest

The massive volume of discovery data that powers Pinterest and enables people to save Pins, create boards and follow other users, is generated through daily Hadoop jobs... Hadoop

1 upvote140 views

Decision at Pinterest about Amazon S3

Avatar of jwegan
Pinterest

We currently log 20 terabytes of new data each day, and have around 10 petabytes of data in S3. Amazon S3

1 upvote126 views