Broccoli: Syncing faster by syncing less
Dropbox syncs petabytes of data every day across millions of desktop clients. It is vital that we constantly improve the sync experience for our users, to increase our users’ productivity in their everyday lives. We also constantly strive to better leverage our infrastructure, increasing Dropbox’s operational efficiency. Because the files themselves are being sent to and from Dropbox during sync, we can leverage redundancies and recurring patterns in common file formats to send files more tersely and consequentially improve performance. Modern compression techniques identify and persist these redundancies and patterns using short bit codes, to transfer large files as smaller, losslessly compressed, files. Thus, compressing files before syncing them means less data on the wire (decreased bandwidth usage and latency for the end user!) and storing smaller pieces in the back end (increased storage cost savings!). To enable these advancements, we measured several common lossless compression algorithms on the incoming data stream at Dropbox including 7zip, zstd, zlib, and Brotli, and we chose to use a slightly modified Brotli encoder, we call Broccoli, to compress files before syncing.