Qubole

By mid-2014, around the time of the Series F, Pinterest users had already created more than 30 billion Pins, and the company was logging around 20 terabytes of new data daily, with around 10 petabytes of data in S3. To drive personalization for its users, and to empower engineers to build big data applications quickly, the data team built a self-serve Hadoop platform.

To start, they decoupled compute from storage, which meant teams would have to worry less about loading or synchronizing data, allowing existing or future clusters to make use of the data across a single shared file system.

A centralized Hive metastore act as the source of truth. They chose Hive for most of their Hadoop jobs “primarily because the SQL interface is simple and familiar to people across the industry.”

Dependency management takes place across three layers: *** Baked AMIs**, which are large slow-loading dependencies pre-loaded on images; Automated Configurations (Masterless Puppets), which allows Puppet clients to “pull their configuration from S3 and set up a service that’s responsible for keeping S3 configurations in sync with the Puppet master;” and Runtime Staging on S3, which creates a working directory at runtime for each developer that pulls down its dependencies directly from S3.

Finally, they migrated their Hadoop jobs to Qubole, which “supported AWS/S3 and was relatively easy to get started on.”

Qubole Discussions

Discover why developers choose Qubole. Read real-world technical decisions and stack choices from the StackShare community.

StackShare Editors

Jul 11, 2014

Big Data at with Hadoop, Hive, and Quoble

Needs adviceon

Puppet Labs

Hadoop

Qubole

A centralized Hive metastore act as the source of truth. They chose Hive for most of their Hadoop jobs “primarily because the SQL interface is simple and familiar to people across the industry.”

Finally, they migrated their Hadoop jobs to Qubole, which “supported AWS/S3 and was relatively easy to get started on.”

0 views0

Comments

John Egan

Feb 13, 2014

Needs adviceon

Qubole

We ultimately migrated our Hadoop jobs to Qubole, a rising player in the Hadoop as a Service space. Given that EMR had become unstable at our scale, we had to quickly move to a provider that played well with AWS (specifically, spot instances) and S3. Qubole supported AWS/S3 and was relatively easy to get started on. After vetting Qubole and comparing its performance against alternatives (including managed clusters), we decided to go with Qubole Qubole

0 views0

Comments

Qubole

What is Qubole?

Key Features

Qubole Pros & Cons

Pros of Qubole

Cons of Qubole

Qubole Integrations

Qubole Discussions

Big Data at with Hadoop, Hive, and Quoble

Qubole Alternatives & Comparisons

Google BigQuery

Amazon Redshift

Snowflake

Amazon EMR

Stitch

Cloudera Enterprise

Try It

Adoption

Qubole Integrations

Qubole Discussions

Big Data at with Hadoop, Hive, and Quoble