Because coordinating distributed systems is a Zoo
Companies using Zookeeper
How Zookeeper is being used
  • Pinterest

    #<User:0x00007f47e3ef40e8> Pinterest

    Like many large scale web sites, Pinterest’s infrastructure consists of servers that communicate with backend services composed of a number of individual servers for managing load and fault tolerance. Ideally, we’d like the configuration to reflect only the active hosts, so clients don’t need to deal with bad hosts as often. ZooKeeper provides a well known pattern to solve this problem.

  • Real-time counts with Stitch

    Real-time counts with Stitch

    Initially, Stitch only supported real-time updates and addressed this problem with a MapReduce job named The Restorator that performed the following actions:

    • Calculated the expected totals
    • Queried Cassandra to get the values it had for each counter
    • Calculated the increments needed to apply to fix the counters
    • Applied the increments

    Meanwhile, to stop the sand shifting under its feet, The Restorator needed to coordinate a locking system between itself and the real-time processors, so that the processors did not try to simultaneously apply increments to the same counter, resulting in a race-condition. It used ZooKeeper for this.

  • Deploying software at Pinterest

    #<User:0x00007f47e3f6eaf0> Deploying software at Pinterest

    Zookeeper manages our state, and tells each node what version of code it should be running.