Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.



  • We currently log 20 terabytes of new data each day, and have around 10 petabytes of data in S3.


  • We ultimately migrated our Hadoop jobs to Qubole, a rising player in the Hadoop as a Service space. Given that EMR had become unstable at our scale, we had to quickly move to a provider that played well with AWS (specifically, spot instances) and S3. Qubole supported AWS/S3 and was relatively easy to get started on. After vetting Qubole and comparing its performance against alternatives (including managed clusters), we decided to go with Qubole


  • Like many large scale web sites, Pinterest’s infrastructure consists of servers that communicate with backend services composed of a number of individual servers for managing load and fault tolerance. Ideally, we’d like the configuration to reflect only the active hosts, so clients don’t need to deal with bad hosts as often. ZooKeeper provides a well known pattern to solve this problem.


  • The massive volume of discovery data that powers Pinterest and enables people to save Pins, create boards and follow other users, is generated through daily Hadoop jobs...


  • When you visit the site, you talk to a load balancer which chooses a varnish front-end which in turn talks to our web front-ends which used to run nine python processes. Each of these processes are serving the exact same version on any given web front-end.



Verified by
Stack Match

Favorite
78
Views
28937


Favorite
Views
28937