Apache Drill vs Impala: What are the differences?
Developers describe Apache Drill as "Schema-Free SQL Query Engine for Hadoop and NoSQL". Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google's Dremel. On the other hand, Impala is detailed as "Real-time Query for Hadoop". Impala is a modern, open source, MPP SQL query engine for Apache Hadoop. Impala is shipped by Cloudera, MapR, and Amazon. With Impala, you can query data, whether stored in HDFS or Apache HBase – including SELECT, JOIN, and aggregate functions – in real time.
Apache Drill can be classified as a tool in the "Database Tools" category, while Impala is grouped under "Big Data Tools".
Some of the features offered by Apache Drill are:
- Low-latency SQL queries
- Dynamic queries on self-describing data in files (such as JSON, Parquet, text) and MapR-DB/HBase tables, without requiring metadata definitions in the Hive metastore.
- ANSI SQL
On the other hand, Impala provides the following key features:
- Do BI-style Queries on Hadoop
- Unify Your Infrastructure
- Implement Quickly
"NoSQL and Hadoop" is the top reason why over 2 developers like Apache Drill, while over 7 developers mention "Super fast" as the leading cause for choosing Impala.
Impala is an open source tool with 2.18K GitHub stars and 824 GitHub forks. Here's a link to Impala's open source repository on GitHub.