Druid
Druid

192
372
+ 1
20
Apache Spark
Apache Spark

1.6K
1.7K
+ 1
112
Add tool

Druid vs Apache Spark: What are the differences?

What is Druid? Fast column-oriented distributed data store. Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments. Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations.

What is Apache Spark? Fast and general engine for large-scale data processing. Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

Druid and Apache Spark can be primarily classified as "Big Data" tools.

"Real Time Aggregations" is the primary reason why developers consider Druid over the competitors, whereas "Open-source" was stated as the key factor in picking Apache Spark.

Druid and Apache Spark are both open source tools. Apache Spark with 22.5K GitHub stars and 19.4K forks on GitHub appears to be more popular than Druid with 8.31K GitHub stars and 2.08K GitHub forks.

Uber Technologies, Slack, and Shopify are some of the popular companies that use Apache Spark, whereas Druid is used by Airbnb, Instacart, and Dial Once. Apache Spark has a broader approval, being mentioned in 266 company stacks & 112 developers stacks; compared to Druid, which is listed in 24 company stacks and 12 developer stacks.

What is Druid?

Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments. Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations.

What is Apache Spark?

Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.
Why do developers choose Druid?
Why do developers choose Apache Spark?

Sign up to add, upvote and see more prosMake informed product decisions

What companies use Druid?
What companies use Apache Spark?

Sign up to get full access to all the companiesMake informed product decisions

What tools integrate with Druid?
What tools integrate with Apache Spark?

Sign up to get full access to all the tool integrationsMake informed product decisions

What are some alternatives to Druid and Apache Spark?
HBase
Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Apache Hadoop.
Cassandra
Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster. Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.
MongoDB
MongoDB stores data in JSON-like documents that can vary in structure, offering a dynamic, flexible schema. MongoDB was also designed for high availability and scalability, with built-in replication and auto-sharding.
Prometheus
Prometheus is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true.
Elasticsearch
Elasticsearch is a distributed, RESTful search and analytics engine capable of storing data and searching it in near real time. Elasticsearch, Kibana, Beats and Logstash are the Elastic Stack (sometimes called the ELK Stack).
See all alternatives
Interest over time
Reviews of Druid and Apache Spark
No reviews found
How developers use Druid and Apache Spark
Avatar of Wei Chen
Wei Chen uses Apache SparkApache Spark

Spark is good at parallel data processing management. We wrote a neat program to handle the TBs data we get everyday.

Avatar of Ralic Lo
Ralic Lo uses Apache SparkApache Spark

Used Spark Dataframe API on Spark-R for big data analysis.

Avatar of Kalibrr
Kalibrr uses Apache SparkApache Spark

We use Apache Spark in computing our recommendations.

Avatar of Dotmetrics
Dotmetrics uses Apache SparkApache Spark

Big data analytics and nightly transformation jobs.

Avatar of brenoinojosa
brenoinojosa uses Apache SparkApache Spark

Data retrieval and analysis of Cassandra.

How much does Druid cost?
How much does Apache Spark cost?
Pricing unavailable
Pricing unavailable
News about Druid
More news
News about Apache Spark
More news