Presto and Parquet for engineering analtyics

By mid-2016, Uber’s team was running more than one hundred thousand analytic queries daily. To keep up, they decided to redesign their analytics system, leveraging Presto, an open source SQL engine for large datasets, and Parquet, a columnar storage format for Hadoop.

Presto was chosen for a few reasons, including its scalability (according to Uber, it can access over five petabytes of data, and completes more than 90% of queries within 60 seconds).

To store its data, Uber also uses Parquet, a Hadoop storage solution that is compressible, has a columnar storage format, is encoded, and has ground-up support for nested data sets. Uber stores its data in columns instead of rows, because it removes the need to scan and discard unwanted data in rows. Columnar storage means more disk space saved, and improved query performance for large datasets.

Presto and Parquet for engineering analtyics

Related Tools

Trending on StackShare

Needs advice on code coverage tool in / with External API Te...

I was building a personal project that I needed to store ite...

Your tech stack is solid for building a real-time messaging ...

I had a goal to create the simplest accounting software for ...

Your development environment should ideally match the produc...