Like many large scale web sites, Pinterest’s infrastructure consists of servers that communicate with backend services composed of a number of individual servers for managing load and fault tolerance. Ideally, we’d like the configuration to reflect only the active hosts, so clients don’t need to deal with bad hosts as often. ZooKeeper provides a well known pattern to solve this problem.
Initially, Stitch only supported real-time updates and addressed this problem with a MapReduce job named The Restorator that performed the following actions:
Meanwhile, to stop the sand shifting under its feet, The Restorator needed to coordinate a locking system between itself and the real-time processors, so that the processors did not try to simultaneously apply increments to the same counter, resulting in a race-condition. It used ZooKeeper for this.