Qubole logo

Qubole

Prepare, integrate and explore Big Data in the cloud (Hive, MapReduce, Pig, Presto, Spark and Sqoop)

What is Qubole?

Qubole is a cloud based service that makes big data easy for analysts and data engineers.
Qubole is a tool in the Big Data as a Service category of a tech stack.

Who uses Qubole?

Companies
4 companies reportedly use Qubole in their tech stacks, including Pinterest, SaleCycle, and MediaMath.

Developers
14 developers on StackShare have stated that they use Qubole.

Why developers like Qubole?

Here’s a list of reasons why companies and developers use Qubole
Qubole Reviews

Here are some stack decisions, common use cases and reviews by companies and developers who chose Qubole in their tech stack.

Puppet Labs
Puppet Labs
Hadoop
Hadoop
Qubole
Qubole

By mid-2014, around the time of the Series F, Pinterest users had already created more than 30 billion Pins, and the company was logging around 20 terabytes of new data daily, with around 10 petabytes of data in S3. To drive personalization for its users, and to empower engineers to build big data applications quickly, the data team built a self-serve Hadoop platform.

To start, they decoupled compute from storage, which meant teams would have to worry less about loading or synchronizing data, allowing existing or future clusters to make use of the data across a single shared file system.

A centralized Hive metastore act as the source of truth. They chose Hive for most of their Hadoop jobs “primarily because the SQL interface is simple and familiar to people across the industry.”

Dependency management takes place across three layers: *** Baked AMIs, which are large slow-loading dependencies pre-loaded on images; **Automated Configurations (Masterless Puppets), which allows Puppet clients to “pull their configuration from S3 and set up a service that’s responsible for keeping S3 configurations in sync with the Puppet master;” and Runtime Staging on S3, which creates a working directory at runtime for each developer that pulls down its dependencies directly from S3.

Finally, they migrated their Hadoop jobs to Qubole, which “supported AWS/S3 and was relatively easy to get started on.”

See more

Qubole's Features

  • Intuitive GUI
  • Optimized Hive
  • Improved S3 Performance
  • Auto Scaling
  • Spot Instance Pricing
  • Managed Clusters
  • Cloud Integration
  • Cluster Lifecycle Management

Qubole Alternatives & Comparisons

What are some alternatives to Qubole?
Databricks
Databricks Unified Analytics Platform, from the original creators of Apache Spark™, unifies data science and engineering across the Machine Learning lifecycle from data preparation to experimentation and deployment of ML applications.
Snowflake
Snowflake eliminates the administration and management demands of traditional data warehouses and big data platforms. Snowflake is a true data warehouse as a service running on Amazon Web Services (AWS)—no infrastructure to manage and no knobs to turn.
Amazon Redshift
It is optimized for data sets ranging from a few hundred gigabytes to a petabyte or more and costs less than $1,000 per terabyte per year, a tenth the cost of most traditional data warehousing solutions.
Google BigQuery
Run super-fast, SQL-like queries against terabytes of data in seconds, using the processing power of Google's infrastructure. Load data with ease. Bulk load your data using Google Cloud Storage or stream it in. Easy access. Access BigQuery by using a browser tool, a command-line tool, or by making calls to the BigQuery REST API with client libraries such as Java, PHP or Python.
Amazon EMR
It is used in a variety of applications, including log analysis, data warehousing, machine learning, financial analysis, scientific simulation, and bioinformatics.
See all alternatives

Qubole's Followers
28 developers follow Qubole to keep up with related blogs and decisions.
Danny Polonsky
Ajay Ramkrishnan
vamsi gumma
laicuRoot
maziar55
Jay Jackson
Sajjad vafaie
Yury Buldakov
fadi assad
Mohamma76685757