Druid vs Impala: What are the differences?
Druid: Fast column-oriented distributed data store. Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments. Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations; Impala: Real-time Query for Hadoop. Impala is a modern, open source, MPP SQL query engine for Apache Hadoop. Impala is shipped by Cloudera, MapR, and Amazon. With Impala, you can query data, whether stored in HDFS or Apache HBase – including SELECT, JOIN, and aggregate functions – in real time.
Druid and Impala can be primarily classified as "Big Data" tools.
"Real Time Aggregations" is the primary reason why developers consider Druid over the competitors, whereas "Super fast" was stated as the key factor in picking Impala.
Druid and Impala are both open source tools. Druid with 8.22K GitHub stars and 2.05K forks on GitHub appears to be more popular than Impala with 2.17K GitHub stars and 825 GitHub forks.
Instacart, Airbnb, and Dial Once are some of the popular companies that use Druid, whereas Impala is used by Stripe, 37 Signals, and Expedia.com. Druid has a broader approval, being mentioned in 24 company stacks & 12 developers stacks; compared to Impala, which is listed in 15 company stacks and 5 developer stacks.