Stitch was initially developed to do the timelines and counts for our stats pages. This is where users can see which of their tracks were played and when.


  • Stitch is a wrapper around a Cassandra database. It has a web application that provides read-access to the counts through an HTTP API. The counts are written to Cassandra in two distinct ways, and it's possible to use either or both of them:

    • Real-time: For real-time updates, Stitch has a processor application that handles a stream of events coming from a broker and increments the appropriate counts in Cassandra.

    • Batch: The batch part is a MapReduce job running on Hadoop that reads event logs, calculates the overall totals, and bulk loads this into Cassandra.


  • Initially, Stitch only supported real-time updates and addressed this problem with a MapReduce job named The Restorator that performed the following actions:

    • Calculated the expected totals
    • Queried Cassandra to get the values it had for each counter
    • Calculated the increments needed to apply to fix the counters
    • Applied the increments

    Meanwhile, to stop the sand shifting under its feet, The Restorator needed to coordinate a locking system between itself and the real-time processors, so that the processors did not try to simultaneously apply increments to the same counter, resulting in a race-condition. It used ZooKeeper for this.


Stack Match

Favorite
Views
313
Favorite
Views
313