Nov 12, 2016
Presto and Parquet for engineering analtyics
By mid-2016, Uber’s team was running more than one hundred thousand analytic queries daily. To keep up, they decided to redesign their analytics system, leveraging Presto, an open source SQL engine for large datasets, and Parquet, a columnar storage format for Hadoop.
Presto was chosen for a few reasons, including its scalability (according to Uber, it can access over five petabytes of data, and completes more than 90% of queries within 60 seconds).
To store its data, Uber also uses Parquet, a Hadoop storage solution that is compressible, has a columnar storage format, is encoded, and has ground-up support for nested data sets. Uber stores its data in columns instead of rows, because it removes the need to scan and discard unwanted data in rows. Columnar storage means more disk space saved, and improved query performance for large datasets.

